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Query: 130 TPYTINGSVLIVNNELAKGLTIKSYEDLLQPSLKGKIAFMPNTSSSAFSQLTNILIAKG 189 

T ++ S+L+VN LA + 1+ YEDLL P LKGKIA ADP+ SSSAF L N+L A G 
Sbjct: 122 TRFSAIPSILMVlTOIIAGNIKIEGYEDLIiNPELKGKIAAADPSASSSAFEHLVNMLYAMG 181 

Query: 190 GYTNPKAWNYVKKLQHNINAIKSSSSSEVYQSVAEC-KI»IIVGLTYEDPSWLQKSGANVSI 249 

K W+YV+KL N++ S SS VY+ VA+G+ VGLTYE+P ++ SG+ V + 
Sbjct: 182 KGDPEKGWDWQKIjCANLDGKLLSGSSAVYKGVADC-EYTVGLTYEEPGISYMSSGSPVJCV 241 

Query: 250 VYPTEGTVFVPSSVAIIKNAPSMKEAKLFINFMLSLDVQNAFGQSTSNRPIRKDAQTSNG 309 

+Y EG + P V I IK +++ AK FI++ +SLD ON + S R IR DA ++ 
Sbjct: 242 IYMKEGVISKPDGVYIIKGGKNLENAKKFIDYCVSLDAQNMLVEKLSRRSIRSDAWTDM 301 

Query: 310 MKALKDIATLKEDYRYVTKHKGQILKTYNRI 340 

+K + +1 ++ ++ V + + + L + I 
Sbjct: 302 VKPMSE I YS I TDNADWEESRQKWLDKFKDI 332 

A related DNA sequence was identified in S. pyogenes <SEQ ID 1617> which encodes the amino acid 
sequence <SEQ ID 1618>. Analysis of this protein sequence reveals the following: 
Possible site: 33 

n signal seq 

insmembrane 9 - 25 ( 4 - 33) 

Final Results 

bacterial membrane Certainty=0 .6265 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAB95371 GB:U75349 periplasmic-iron-binding protein BitC 
[Brachyspira hyodysenteriae] 
Identities = 115/324 (35%) , Positives = 177/324 (54%) , Gaps = 8/324 (2%) 

VI I ILAIVNVAMYI F S S S KKDS AKELVI LTPNSQT I LTGT I PAFEEKY-GVKVRL 6 8 

+++I + ++++IF S S S LVI P+ + + F+ K G+ V + 

IVLIFTSLLLSVFI FYSCSSSESGAQSGNSLVI YCPHPLEFINPLVDDFKAKNPGINVDI 6 3 



Query: 


15 


Sbjct: 


4 




69 


Sbjct: 


64 


Query: 


128 


Sbjct: 


123 




188 


Sbjct: 


183 




248 


Sbjct: 


243 






Sbj ct: 


303 



3 PL DI +GG 



T T S+L+VN LA + I YEDLL P LKGKIAFADP++SSS+F L N+L A 



W Y+ +L N++ + SS VY+ VA+G+ VGLT+E+ 



I- IIK+A N+ AK F+++ S D Q 



HDMKALETIATLKEDYAYVTKHKK 331 

+++++TI + +D A V ++K+ 
AILQSVDTINVITDDEAWDQNKQ 326 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 257/345 (74%), Positives = 295/345 (85%), Gaps = 1/345 (0%) 

Query: 1 MKEKQSKRLIYILLWSIIFISVFTYSISQPSICLLPPKELVILSPNSQAILTGTIPAFEE 60 

+K K+ L ++L+++ + ++V Y S SK KELVIL+PNSQ ILTGTIPAFEE 
Sbjct: 2 LKLKRKWLLSFLLVIIILAIVNVAMYIFSS-SKKDSAKELVILTPNSQTILTGTIPAFEE 60 
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Query: 


61 


Sbjct: 


61 


Query: 


121 


Sbjct: 


121 


Query: 


181 








241 


Sbjct: 


241 






Sbj ct: 


301 



KYG+KV+LIQGGTGQLID+L ++ K L ADIFFGGNYTQFESHK LFESYVS V TVI 



ATPYTINGSVLIVNNELA+GL I SYEDLLQP+LKGKIAFADPN+SSSAFSQ 



LTNILLAKGGYTN AW Y+K+L N+N+ 1 + ++ S8SEVYQ3VAEGKMI VGLTYEDP +NL 



QKSGANVSIVYP EGTVFVPSSVAIIK+AP+M EAKLFINFMLS DVQNAFGQSTSNRPI 



SEQ ID 1616 (GBS263) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 49 (lane 4; MW 63kDa). 

The GBS263-GST fusion product was purified (Figure 205, lane 5) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 301), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 506 

A DNA sequence (GBSx0544) was identified in S.agalactiae <SEQ ID 1619> which encodes the amino 
acid sequence <SEQ ID 1620>. This protein is predicted to be response regulator. Analysis of this protein 
sequence reveals the following: 

3 N-terminal signal sequence 



35 Final Results - 

bacterial cytoplasm --- Certainty=0.4733 (Affi: 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

40 The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF31452 GB:AF221126 putative response regulator [Streptococcus pneumoniae] 
Identities = 85/252 (33%), Positives = 147/252 (57%), Gaps = 17/252 (6%) 

Query: 2 YRLLIVEDEHLIRKWLRYAIDYQSLNILWGEAKDGKEGAQLIQEEQPDIVLSDINMPIM 61 
45 Y +LIVEDE+L+R+ L ++ + ++ ++G+A++G++ +LIQ++ PDI+L+DINMP + 

Sbjct: 3 YTILIVEDEYLVRQGLTKLVWAAYDMEIIGQAENGRQAWELIQKQVPDIILTDINMPHL 62 

Query: 62 TAFDMFEATKGQSYAK 1 ILSGYADFPNAQSAIHYGVLEFLTKPLEKQALIDCLKTIM 118 

+ + ++Y + + L+GY DF A SA+ GV ++L KP +Q + + L I 
50 Sbjct: 63 NGIQLASLVR-ETYPQVHLVFLTGYDDFDYALSAVKLGVDDYLLKPFSRQDIEEMLGKIK 121 

Query: 119 ARIE - EHKEKHIX3EHTEIIYIJPLPQANDQVPEVIKD^ILAWIHSHFHGKI VI SQLAHDLGYS 177 

+++ E KE+ LQ+ L + + + 1+ IA + + LA DLG+S 

Sbjct: 122 QKLDKEEKEEQLQD LLTNRFEGNMAQKIQSHLA- DSQFSLKSLASDLGFS 170 

55 
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Sbjct: 171 PTYLSSLIKKELGLPFQDYLWERVKQR-KLLLLTTDLKIYEIAEKVGFEDMNYFTQRFK 229 

Query: 238 KYLGQTVKAFKE 249 

+ G T + FK+ 
Sbjct: 23 0 QIAGVTPRQFKK 241 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1621> which encodes the amino acid 
sequence <SEQ ID 1622>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 423 9 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 193/257 (75%) , Positives = 226/257 (87%) 



61 MTAFDMFEATKGQSYAKIILSGYADFPNAQSAIHYGVLEFLTKPLEKOALIDCLKTIMAR 120 

MTAFDMFE TK Q+YAKIILSGYADFPNA+SAIHYGVLEFLTKP+EK AL +CL+TI+A+ 
61 MTAFDMFEVTKDQTYAKIILSGYADFPNARSAIHYGVLEFLTKP1EKAALWECLQTIIAK 120 

121 IEEHKEKHLQEHTELYLPLPQAWDQVPEVIKDMLAWIHSHFHGKIVISQLAHDLGYSESY 180 

IE+ K + + +Y+PLPQ DQ+PEV+KD+L W+H+HF KI S+LAHDLGYSESY 
121 IEKQKGSNOKTDAC^IPLPQMTDQIPEVVKDIIiEWVHAHFQDKISTSRIjAHDLGYSESY 180 

181 LYTVTKKHLHITLSDYINQYRINQAIQLMFREPDLMVYQIAEAVGIYDYRYFDRVFKKYL 240 

+Y KKHL + LSDYINQYRINQAIQLM +EPDLMVY+ IA+AVGI YDYRYFDRVFKKYL 
181 IYQNIKKHLQMPLSDYINQYRINQAIQLMQQEPDLMVYEIAQAVGIYDYRYFDRVFKKYL 240 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 507 

A DNA sequence (GBSx0545) was identified in S.agalactiae <SEQ ID 1623> which encodes the amino 
acid sequence <SEQ ID 1624>. Analysis of this protein sequence reveals the following: 



■ Final Results 

bacterial cytoplasm Certainty=0. 2964 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that lias protein and its epitopes, could be useful antigens for 
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Example 508 

A DNA sequence (GBSx0546) was identified in S.agalactiae <SEQ ID 1625> which encodes the amino 
acid sequence <SEQ ID 1626>. This protein is predicted to be two-component sensor histidine kinase. 
Analysis of this protein sequence reveals the following: 

Possible site: 45 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-13.80 Transmembrane 266 - 282 ( 257 - 285) 
INTEGRAL Likelihood =-12.90 Transmembrane 29 - 45 ( 24 - 51) 



Final Results 

bacterial membrane Certainty=0 . 5519 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certair_ty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10197> which encodes amino acid sequence <SEQ ID 
10198> was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB05628 GB:AP001513 two-component sensor histidine kinase 
[Bacillus halodurans] 
Identities = 84/258 (32%) , Positives = 138/258 (52%) , Gaps = 23/258 (8%) 

Query: 298 SSAINQMVLDMDAI SRQEKSS IELDSQDEFQTLSVQINQMVSRLKDLHEKTLDLETQKLL 357 

S INQ+ S K+ I +D +DE LSVQ NQMV+ L+ L + + QK L 

Sbjct: 327 SERINQVA SGDLKTKIVVDGKDEIGQLSVQFNQMVANLRSLIHQVHETNRQKRL 380 

Query: 358 FEK — RMLEAQFNPHFLYNTLETILITSHYDSQL-TERIVIQLTKLLRYSLSGST 409 

EK +ML +Q NPHFL+NTLE+I + SH + ++V QL KL+R SL + 

Sbjct: 381 LEKSQNEIKLKMIASQINPHFLFNTLESIRMKSHMKGETEIAKVVKQLGKLMRKSLEVTG 440 

Query: 410 EAAVLKDDIAIIESYIilNO^F-EELTrTISVSPELEHMRVPKLFLLPLIENAIKyGLK 468 

L+++L ++ YLI R++LY++P+E++ L + PL+ENA+ +GL+ 
Sbjct: 441 HHIPLRNELDMVRCYLEIQTFRYGDRLHYELYIDPQSEMVEILPLIIQPLVENAVIHGLE 500 

Query: 469 ERHD-VAINIDIWQDSDGIWFTVSNNGSGISIARQQAIRTMLRSTH SHHGLINSYR 523 

D +1 + + + V+++G G+ + +AI+ ML + GL+K ++ 

Sbjct: 501 RTEDGGTVTISTIWGNDLTOIVNDDGCGMDEEKLEAIQNMLHHPQEVDGNKIGLLNVHK 560 

Query: 524 RLQYQF STVLLEFTK 538 

RLQ + S +++E K 
Sbjct: 561 RLQLTYGKTSGLI IESAK 578 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1627> which encodes the amino acid 
sequence <SEQ ID 1628>. Analysis of this protein sequence reveals the following: 
Possible site: 43 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-10.88 Transmembrane 27 - 43 ( 22 - 49) 
INTEGRAL Likelihood = -9.08 Transmembrane 263 - 279 ( 258 - 282) 

Final Results 

bacterial membrane Certainty=0 .5352 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=o . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:BAB05628 GB:AP001513 two-component sensor histidine kinase 
[Bacillus halodurans] 
Identities = 85/270 (31%), Positives = 139/270 (51%) , Gaps = 20/270 (7%) 
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Query: 276 IFVILQRKSSGLANRIAAKNSRAIICWSDMSftlSRQEKRRIDLZSQDEFQYLSDQINQM 335 

+ V+L S L ++ + S INQ+ S K +1 ++ +DE LS Q NQM 

Sbjot: 307 VAVLLIVHFSWLISKRLSHLSERINQVA SGDLKTKIWDGKDEIGQLSVQFNQM 360 

Query: 335 VERLQQLHDKTLDLETQKLLFEK RMLEAQFNPHFLYKTLETIUTSHYDSaL- 387 

V L+ L + + QK L EK +ML +Q NPHFL+NTLE+I + SH 

Sbjct: 351 VANLRSLIHQVHETNRQKRLLEKSQNEIKLKMliASQINPHFLFNTLESIRMKSHMKGETE 420 

Query: 388 TEKIVIQLTKLLRYSLTDSSKPVLLKDDLSVIESYLVINQVRF-EELQYSINLSPDLDSL 446 

K+V QL KL+R SL + + L+++B ++ YLI R+ + LY + + P + + 
Sbjct: 421 IAKWKQLGKLMRKSLEVTGHHIPLRNELDMVRCYLEIQTFRYGDRLHYELYIDPQSEMV 480 

Query: 447 EVPKLFLLPLIENAIKYGLKERHD-VKINIACYYQDDHIIFSVRDNGSGIDAHHQKV1RE 505 

E+ L + PL+ENA+ +GL+ D + 1+ + + V D+G G+D + 1 + 

Sbjct: 481 EILPLIIQPLVENAVIHGLERTEDGGTVTISTIVNGNDLTVIVNDDGCGMDEEKLEAIQN 540 

Query: 506 QL EAGESHHGLINSYRRLKYHFSEVS 531 

L E + GL+N 4-+RL+ + + S 
Sbjct: 541 MLHHPQEWDGNKIGLLNVHKRLQLTYGKIS 570 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 369/549 (67%) , Positives = 449/549 (81%) 

MRGYRMEERFKKRLQDDISKHFSRQSLILSLLLIALFVLFSLAPQQIGLYKDVNSVSYSY 62 
MRG ++EE FKK+LQDDIS+HFS QSL+LSLEIiI LF++FSLAPQQ+GLY+D+N+ + Y 
MRGEQVEEHFKKQLQDDISRHFSYQSLMLSLLLIGLFIIFSLAPQQLGLYRDINATATRY 60 



ASN HLGN FSKS+YI EVL - L K +DSE GHYL +1 P+I 



+M+GKDFL PTK + S+L+IAD+L+N+FTF+KR+FI+SSLDK++SQ+L YF F D+RAF 











Query: 




Sbjct: 


61 




123 


Sbjct: 


121 




183 


Sbjct: 


181 




243 


Sbjct: 


241 


Query: 


303 


Sbjct: 


301 




363 


Sbjct: 


361 




423 


Sbjct: 


421 




483 


Sbjct: 


481 




543 


Sbjct: 


541 



1- LYMYRPLIP+ V+LFSL+SS +IFVIL++KS LA+RIA KNS AIN 



QMV DM AISRQEK I+L+SQDEFQYLS QINQMV RL+ LH+KTLDLETQKLLFEKRM 



LEAQFNPHFLYNTLET1LITSHYDS LTE-IVIQLTKLLRYSL+ S++ +LKDDL++IE 



SYL+IWQVRFEEL Y+I++SP+L+ + VPKLFLLPLIENAIKYGLKERHDV INI 



A related GBS gene <SEQ ID 8587> and protein <SEQ ID 8588> were also identified. Analysis of this 
protein sequence reveals the following: 
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Lipop: Possible site: -1 Crend: 10 

McG: Discrim Score: 6.23 

GvH: Signal Score (-7.5): -0.0500002 

Possible site: 38 
>>> Seems to have a cleavable N-terra signal seq. 
ALOM program count: 1 value: -13.80 threshold: 0.0 

INTEGRAL Likelihood =-13.80 Transmembrane 259 - 275 ( 250 - 278) 
PERIPHERAL Likelihood = 2.70 404 
modified ALOM score: 3.26 

*** Reasoning Step: 3 



Final Results 

bacterial membrane Certainty=0. 6519 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

33.2/53.9% over 181aa 

Streptococcus pneumoniae 
GP | 583 0535 | hietidine kinase Insert characterized 

ORF00032 (1309 - 1848 of 2253) 
GP|5830535|emb[CAB54576.l| |AJ006396(1 - 182 of 231) histidine kinase {Streptc 
pneumoniae } 
%Match =5.9 

%Identity =33.2 %Similarity -53.8 

Matches = 61 Mismatches = 78 Conservative Sub.s =38 



1293 1323 1350 1380 1410 1440 1494 

DSQDEFQYI,SVQIWQMVSRL-KDLHEKTLDLETQKLLFEKRMLEAQFN?KFLYNTLETILITSHYDSO--LTERIVIQLT 

h II h = h II : I 1 = 11 lllhlllll = = = || |.« |. ... . 

MLDRLEKNIHD-IYQLELSQKDANMRALQAQINPHFMYNTLEFLRMYAVMQSQDELAD-IIYEFS 
10 20 30 40 50 60 

1524 1554 1584 1611 1641 1671 1701 1728 

KLLRYSLSGSTEAAVLKDDLAIIESYLLINQVRF-EELTYTISVSPELEHMRVPKLFLLPLIENAIKYGLKERH-DVAIN 
Ml ::| =11 =1 | : ||: : : | : ||||=|=.||: I ||=|| :|: | I |= 

SLLRNN1S-DERETLLKQELEFCRKYSYLCMVRYPKSIAYGFKIDPELENMKIPKFTLQPLVENYFAHGVDHRRTDNVIS 
80 90 100 110 120 130 140 

1758 1788 1818 1848 1878 1908 1938 1968 

IDIWQDSDGIWFTVSNNGSGISLARQQAIRTMLRSTHSHHGLINSYRRLQYQFSTVLl^FTKTDDAFRVSYIvKE*VT!Cffi 

I = = I :|| hi » II I = I I =1 II I 

IJCALKQDGFVEILVVDNGRGMSAEKLANIREKLSQRYFEHQASYSDQRQSIGIVNVHERFVLYFGDRYAITIESAEQAGV 
160 170 180 190 200 210 220 

SEQ ID 8588 (GBS47) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 14 (lane 2; MW 84kDa). It was also expressed in E.coli as a His-fusion product. 
SDS-PAGE analysis of total cell extract is shown in Figure 85 (lane 4; MW 59.3kDa). 
GBS47-His was purified as shown in Figure 221, lane 4-5. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 509 

A DNA sequence (GBSx0547) was identified in S.agalactiae <SEQ ID 1629> which encodes the amino 
acid sequence <SEQ ID 1630>. This protein is predicted to be phosphotransferase enzyme II, D 
component. Analysis of this protein sequence reveals the following: 

Possible site: 32 

»> Seems to have no N-texminal signal sequence 

INTEGRAL Likelihood =-10.46 Transmembrane 258 - 274 ( 252 - 274) 

INTEGRAL Likelihood = -9.13 Transmembrane 232 - 248 ( 227 - 251) 

INTEGRAL Likelihood = -5.31 Transmembrane 142 - 158 ( 140 - 161) 

INTEGRAL Likelihood = -2.50 Transmembrane 119 - 135 ( 118 - 139) 

Final Results 

bacterial membrane Certair.ty=0 . 5182 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC74389 GB:AE000276 PTS enzyme IID, mannose-specif ic 
[Escherichia coli K12] 
Identities = 94/280 (33%) , Positives = 156/280 (55%) , Gaps = 13/280 (4%) 



G +R + A +G +R +GS +G ++F +L+N+ 



. +T+ +S+LGL ++G++ T N+ G + ++Q+ LDQL G+V 





3 


Sbjct: 


12 




62 


Sbjct: 


68 


Query: 


122 


Sbjct: 


128 




182 


Sbjct: 


187 




238 


Sbjct: 


247 



WLLRKKVN WI+ G V+GI 



A related DNA sequence was identified in S.pyogenes <SEQ ID 1631> which encodes the amino acid 
sequence <SEQ ID 1632>. Analysis of this protein sequence reveals the following: 

Possible site: 32 
>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -9.98 Transmembrane 255 - 271 ( 251 - 274) 

INTEGRAL Likelihood = -7.01 Transmembrane 232 - 248 ( 228 - 250) 

INTEGRAL Likelihood = -5.68 Transmembrane 142 - 158 ( 140 - 161) 

INTEGRAL Likelihood = -2.50 Transmembrane 119 - 135 ( 118 - 139) 

Final Results 

bacterial membrane --- Certainty=0.4991 (Affirmative) < suco 

bacterial outside -— Certaincy=0. 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAC74889 GB:AE000276 PTS enzyme IID, mannose-specif ic 
[Escherichia coli] 

Identities = 94/281 (33%) , Positives = 157/281 (55%) , Gaps = 13/281 (4%) 



Query: 2 TSQDNLTKEDRKMLRSVr^ffiSWTMNASRTGATQYHAVGVIYTLLPVINRFyKTDKD-KAE 60 
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Sbjct: 


11 


Query: 


61 


Sbjct: 


67 


Query: 


121 


Sbjct: 


127 




181 


Sbjct: 


186 


Query: 


237 


Sbjct: 


246 



T++ LT+ D +R VF RS S + A+G ++++P I R Y 



I+G+ 4+E++ + + D AI +K LMGP++GVGD F 



WG +R + A +G +A +GS +G ++F +L+N+ YY + GYS G 



^ +T+ +S+LGL ++G++ 



VPL+4-T A WLLRKKVN +WI + G 4-GI 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 263/275 (95%) , Positives = 269/275 (97%) 

MKSQDNLTKEDRKMLRSVFWRSWTMNASRTGATQYHAVGVI YTLLPVINRFYKTDKDKAE 6 0 
M SQDNLTKEDRKMLRSVFWRSWTMNASRTGATQYHAVGVIYTLLPVINRFYKTDKDKAE 
MTSQDNLTKEDRKMLRSVFWRSOTMNASRTGATQYHAVGVI YTLLPVINRFYKTDKDKAE 6 0 

ALVRHTTWFNATMHINNFIMGLVASMEia<NSEDPDFDASAITAVKASLMGPISGVGDSFF 120 
ALVRHTTWFMATMHIMNFIMGLVASMEKKNSEDPDFDASAITAVKASLMGPISGVGDSFF 





1 


Sbjct: 


1 




61 


Sbjct: 


61 




121 


Sbjct: 


121 




181 


Sbjct: 


181 




241 


Sbjct: 


241 



WGILRVIAAGIGISLAS GSMGAWFLLLYNIPAF+IHYYSLYGGYSVGAGFIKKLYES 
WGILRVIAAGIGISLASAGSAMGAWFLLLYNIPAFIIHYYSLYGGYSVGAGFIKKLYES 180 

GGIKIVTKTSSMLGLMMVGSMTASNVKFKTIIjTVAAKGAKEAASIQSYLDQLFVGVVPLL 240 
GGIKIVTKTSSMLGLMMVGSMTASNVKFfCTILTVAAKGAKFAASIQ YLDQLF+G+VPL+ 
GGIKIVTKTSSMLGLMWGSMTASNVKFPCTILWAAKGAiaMSIQDYLDQLFIGIVPLM 240 

VTIIAFWLLRKKVNIIIWIMFGIMVLGIVLGIjLGIC 275 
VT+ AFWLLRKKVNI WIMFGIM LGI+LGLLGIC 
VTLAAFWLLRKKVNI IWIMFGIMFLGI IIX3LLGIC 275 

There is also homology to SEQ ID 5236. 

A further related DNA sequence was identified in S.pyogenes <SEQ ID 9077> which encodes the amino 
acid sequence <SEQ ID 9078>. An alignment of the GAS and GBS sequences follows: 



Sbjct: 

Sbjct 
Sbjct 



2 IMEEITIYHHPNCGTSRNVLAI1IRHAGIEPTIIEYLQTPPNRETLIELLQSMGISARELL 61 

+ME+I IYHNPNCGTSRNVLA+IRH GIEP II YL+TPP+R L+ELL M +SARELL 
1 ^KIRIYHNPNCGTSRNVLAIIRHCGIEPEIIYYLKTPPSroiELVELLLEMKLSARELL 60 

62 RTOTPEFEAYGIJ^QAVAEKDIIIIAMLADPILINRPIVVTRKGVKLCRPSETLLDILPVP 121 

RT+VP +E + L + +V ++++I+AM+ DPILINRPIWT KG KLCRP E +L ILPV 
61 RTDVPAYEKFNLESSSVTDEEMIDAMIQDPILINRPIWTSKGAKLCRPCEAILTILPVK 120 

122 LPSPYIKEDGESVNPI 137 

+ ++KEDG+ + + 
121 MEKDFVKEDGQI IQSL 136 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 510 

A DNA sequence (GBSx0548) was identified in S.agalactiae <SEQ ID 1633> which encodes the amino 
acid sequence <SEQ ID 1634>. This protein is predicted to be PTS permease for mannose subunit IIPMan. 
Analysis of this protein sequence reveals the following: 



INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 



l uncleavable N-term signal seq 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



53 



• Final Results - 

bacterial r 
bacterial outside - 
bacterial cytoplasm - 



144 - 160 ( 140 - 

220 - 236 ( 215 - 

95 - 111 ( 91 - 

2 - 18 ( 1 - 

180 - 196 ( 179 - 

32 - 48 ( 30 - 

198 - 214 ( 198 - 



-- Certainty=0. 4482 (Affirmative) < succ; 
-- Certainty=0 . 0000 (Not Clear) < euco 
-- Certainty=0. 0000 (Not Clear) < succ? 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC44680 GB:U65015 PTS permease for mannose subunit IIPMan 
[Vibrio furnissii] 

Identities = 70/251 (27%) , Positives = 132/251 (51%) , Gaps = 6/251 (2%) 



Sb j ct : 



IMPATMAALAVLICFGGNYLTGQSMMERPLWGLVTGMLLGDIKVGILMGASLEALFLGN 61 
+ AML + G+ G+ RP+V4G + G++LGD+ GIL+G +LE +++G 
LFQAIMLGLIAFLA-GLDLFNGLTHFHRPVV^ 63 



Query: 52 WIGGVIAAEPVTATAMATTFTIISNIDQKAA^1T]^VPIGVI^FVWFLKNVFMNIFAP 121 

+ G + T + TTF I +N++ A+ +AVP + + L + + + 

Sbjct: 54 APIAGAQPPNVIIGTIVGTTFAITTNVEPNVAVGVAVPFAVAVQMGITLLFSAMSAVMSK 123 

Query: 122 ^ra3KAAaANHQGKLVMLHYGTWII--YYLIrASISFIGILVGSGPvNSFVHHIPQNL^lNG 179 

+ A A+ +G + ++ ++ +Y + A F+ I +G+ + V +P+ L++G 
Sbjct: 124 CDEYAKNADTRGIERVNYFALAVLGSFYFLCA FLPIYLGADHAGAMVAALPKALIDG 180 

Query: 180 LSAAGGLLPAVGFAMLMKLLWTNKLAVFYLLGFVLTAYLI<LPAVAVAALGAVICVISSQR 239 

L AGG++PA+GFA+LMK++ N +++LGFV A+L+LP +A+ + +1 R 

Sbjct: 181 LGVAGGIMPAIGFAVLMKimKNAYIPYFIMFVTAAAWLQLPIIAIRCAATAMAIIDFMR 240 

Query: 240 DIELDAITRGA 250 

E + A 
Sbjct: 241 KSEPTPVNASA 251 



45 A related DNA sequence was identified in S.pyogenes <SEQ ID 1635> which encodes Ihe amino acid 
sequence <SEQ ID 1636>. Analysis of this protein sequence reveals the following: 

Possible site: 56 



50 





have an uncleavable N- 


term signal seq 










INTEGRAL 


Likelihood = -8.70 


Transmembrane 


144 


160 


140 


165 


INTEGRAL 


Likelihood = -8.07 


Transmembrane 


220 




215 


23S 


INTEGRAL 


Likelihood = -7.27 


Transmembrane 


95 




91 


116 


INTEGRAL 


Likelihood = -4.62 


Transmembrane 


2 


18 


1 


19 


INTEGRAL 


Likelihood = -1.44 


Transmembrane 


180 


196 


179 


196 


INTEGRAL 


Likelihood = -0.9S 


Transmembrane 


32 


48 


31 


49 


INTEGRAL 


Likelihood = -0.53 


Transmembrane 


198 


214 


198 


214 



■ Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



-- Certainty=0. 4482 (Affirmative) . 
-- Certainty=0 . 0000 (Not Clear) < , 
-- Certainty=0 . 0000 (Not Clear) < , 
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The protein has homology with the following sequences in the databases: 

>GP:AAC44680 GB-.U65015 PTE permease for mannose suburdt IIPMan 
[Vibrio furnissii] 

Identities = 72/251 (28%) , Positives = 132/251 (51%) , Gaps = 6/251 (2%) 

LVPATMAAIAVLICFGGNYLTGQSMMERPLWGLVTGLLLGDMKVGILMGASLEALFLGN 61 
L A M L + G+ G+ RP+V+G + GL+LGD+ GIL4-G +LE +++G 
LFOALMLGLLAFLA-GLDLFNGLTHFHRPWLGPLVGLILGDLHTGILVGGTLELIWMGL 63 



A V +P+ L++G 





2 


Sbjct: 


5 


Query: 


62 


Sbjct: 


64 


Query: 


122 


Sbjct: 


124 


Query: 


180 


Sb j ct : 


181 


Query: 


240 


Sbjct: 


241 



L AGG+ + PA+GFA+LMK+ + N +++LGFV A+L+LP +A+ 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 261/269 (97%) , Positives = 268/269 (99%) 

Query: 1 mimpatmaaiavlicfggnyltgqsmmerplwglvtgmllgdikvgilmgaslealflg 60 

M++PATMAftLAVLICFGGNYLTGQSMMERPLWGLVTG+LLGD+KVGILMGASLKaLFLG 
Sbjct: 1 MLVPATMAMjAVLICFGG^LTGQSMMERPLWGLVTGLLLGDMIWGILMGASLEALFLG 60 

Query: 61 1WNIGGVIAAEPOTATAMATTFTIISNIDQKAAMTIAVPIGMLAAFVVMFLKNVFMNIFA 120 

NVNIGGVIAAEPVTATAMATTFT 1 1 S + 1 DQKAAMTIAVPIGMIAAFWMFLKWVFMNI FA 
Sbjct: 61 NWIGGVIAAEPVTATAI^TTFTIISHIDQKAAMTLfiVPIGMIAAFVVMFLKNVFMNIFA 120 

Query: 121 P^IVDKARAANHQGKLV^1LHYGTWIIYYLIIASISFIGILVGSGPVNSFVHHIPQNL^INGL 180 

PMVDKAAAANHQGKLVMLHYGTWIIYYLIIASISFIGILVGSGPVN+FV HIPQNLMNGL 
Sbjct: 121 P^WDKAflAANHQGKLVMLHyGTWIIYYLIIASISFIGILVGSGPVNAFVEHIPQNLMNGL 180 

Query: 181 SARGGLLPAVGFAMLMKLLWTNKLAVFYLLGFVLTAYLKLPAVAVAALGAVICVISSQRD 240 

SAAGGLLPAVGFAMLMKLLWmKIAVFyLICFVLTAYLKLPAVAVAALGAVICVISSQRD 
Sbjct: 181 SAAGGLLPAVGFAMLMKLLWTNKLAVFYLLGFVLTAYLKLPAVAVAALGAVICVISSQRD 240 

Query: 241 IELDAITRGAI SKQTTFDSKESEEEDFFA 269 

+ELDAITRGAI SKQTTFDSKESEEEDFFA 
Sbjct: 241 LELDAITRGAISKQTTFDSKESEEEDFFA 269 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 511 

A DNA sequence (GBSx0549) was identified in S.agalactiae <SEQ ID 1637> which encodes the amino 
acid sequence <SEQ ID 1638>. This protein is predicted to be pts system, sorbose-specific iib component. 
Analysis of this protein sequence reveals the following: 



- Final Results 

bacterial cytoplasm Certainty=0. 1874 (Affirmative) < SU cc> 

bacterial membrane Certainty=0. 0000 (Not Clear) <; suco 



WO 02/34771 PCT/GB01/04789 
-609- 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA46858 GB:X66059 EIII-B Sor PTS [Klebsiella pneumoniae] 
Identities = 49/158 (31%) , Positives = 94/158 (59%) , Gaps = 8/158 (5%) 

Query: 2 ITQIRVDDRLIHGQVAvWTKEmAPLIaWJ^EARKNEITQMTLKMAVPNGMKLLIRSV 61 

IT R+DDRLIHGQV VW+K NA +++ ND+ +E+ + L+ A P GMK+ + S+ 
Sbjct: 3 ITI^IDDRLIHGQVTTWSK^ANAQRIIICNDDVFNDETORTLLRQAAPPGMKVNVVSL 62 

Query: 62 EESIALFKDPRATDKRI WIWSVKEACTIAKNITDLEAVOTANVGRFDKSDPATKVKLT 121 

E+++A++ +P+ D+ +F + + D T+ + + +N+ + + K +LT 

Sbjct: 63 EKAVAVYHNPQYQDETVFYLFTNPHDVLTMVRQGVQIATLNIGGM AWRPGKKQLT 117 

Query: 122 SSLLLNTEELEAAKELASL-PDLDVFNQVLPSMTKVWL 158 

++ L+ ++++A +EL L LD+ +V+ S+ VN+ 
Sbjct: 118 KAVSLDPQDIQAFRELDKLGVKLDL- -RWASDPSWI 153 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1639> which encodes the amino acid 
sequence <SEQ ID 1640>. Analysis of this protein sequence reveals the following: 

o N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1874 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 145/162 (89%) , Positives = 152/162 (93%) 

Query: 1 MITQIRVDDRLIHGQVAVVWTKELNAPLLWANDEAAKNEITQMTLKmVPNGMKLLIRS 60 

MITQIRVDDRLIHGQVAvVWTKELNAPijLWANDEAAKNEITQMTLKMAVPNGMKLLIRS 
Sbjct: 1 MITQIRVDDRLIHGQVAVVWTKELNAPLLWANDEAAKNEITQMTLKMAVPNGMKLLIRS 60 

Query: 61 VEESIALFKDPRATDKRIFVIWSVKDACTIAKNITDLEAVNVANVGRFDKSDPATKVKL 120 

VE+SI LF DPRA DKRI FVI VNSVKDAC IAK + DLEAVNVANVGRFDKSDPA+KVK+ 
Sbjct: 61 VEDS I KLFNDPRAKDKRI FVI WSVKDACAIAKEVPDLEAVNVANVGRFDKSDPASKVKV 120 

Query: 121 TSSLLLNTEELEAAKELASLPDLDVFNQVLPSNTKVNLSQLV 162 

T SLLIN EE+ AAKEL SLP+LDVFNQVLPSNTKV+LSQLV 
Sbjct: 121 TPSLLIjNPEEMAAAKELVSLPELDVFNQ\'LPSNTKVHLSQLV 162 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 512 

A DNA sequence (GBSx0550) was identified in S.agalactiae <SEQ ID 1641> which encodes the amino 
acid sequence <SEQ ID 1642>. Analysis of this protein sequence reveals the following: 

possible site: 46 

»> Seems to have no N-terminal signal sequence 

■ 104) 



- Final Results 

bacterial membrane Certainty=0 . 1489 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 



The protein has no significant homology with any sequences in the GENPEPT database. 



WO 02/34771 



-610- 



PCT/GB01/04789 



A related DNA sequence was identified in S. pyogenes <SEQ ID 1643> which encodes the amino acid 
sequence <SEQ ID 1644>. Analysis of this protein sequence reveals the following: 

Possible site: 33 

.erminal signal sequence 

( 87 - 104) 

Final Results 

bacterial membrane Certainty=0. 1574 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
An alignment of the GAS and GBS proteins is shown below: 

Identities = 115/141 (81%) , Positives = 125/141 (88%) 

Query: 1 MKRKFLIGSHGKLASGLQSSIDILTGKGQEIQTIDAYIDDSDYTKSIVEFIDEIAPDEQG 60 

MKRKFLIGSHG+LASGLQSSIDIL G GQ ++TIDAY+DDSDYT I +FI +A DEQG 
Sbjct: 1 MKRKFLIGSHGRIASGIX3SSIDIIAGMGQALETIDAYVDDSDYTSQIDDFIAGVAADEQG 60 

Query: 61 LIFTDLLGGSWQKMATAvMNSGKNNIFLITNSNLATLLSLLFLKPEEELTKEEIVTVIN 120 

MFTDLLGGSVNQKM TAVMNSGK+NIFLITNSKLATLLSL+FLKP E LTK+EIVTVIN 
Sbjct: 61 LIFTDLLGGSOTQK^lvTAV^raSGKDNIFLITNSNLATLLSLVFLKPGEALTKDEIVTVIN 120 



Query: 121 E 

ESQVQLVDL 

Sbjct: 121 ESQVQLVDLVPETNSEDDFFD 141 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 513 

A DNA sequence (GBSx0551) was identified in S.agalactiae <SEQ ID 1645> which encodes the amino 
acid sequence <SEQ ID 1646>. Analysis of this protein sequence reveals the following: 

d N- terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 .2469 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 514 

A DNA sequence (GBSx0552) was identified in S.agalactiae <SEQ ID 1647> which encodes the amino 
acid sequence <SEQ ID 1648>. This protein is predicted to be racemase. Analysis of this protein sequence 
reveals the following: 

l 

>J-terminal signal sequence 
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Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



- 335 ( 316 - 

- 34 ( 17 - 

- 246 ( 227 - 

- 270 ( 254 - 

- 126 ( 110 - 

- 177 ( 156 - 

- 148 ( 132 - 

- 302 ( 286 - 

- 69 ( 52 - 



- Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



- Certainty=0. 4461 (Affirmative) . 

- Certainty=0. 0000 (Not Clear) < i 

- Certainty=0. 0000 (Not Clear) < : 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF71283 GB:AF253562 racemase [Enterococcus faecalis] 
Identities = 78/262 (29%) , Positives = 129/262 (48%) , Gaps = 29/262 (11%) 





13 


KQHNTSMISLLQYLFSILVILVHSGRLF3-QDVIHFTFKSFLGRMAVPYFLICTAFFLRG 


71 






K + S I +++ ++L++ +H+ LFS + +F F + +AVP+F + + FFL 




Sbjct: 


3, 


KNESYSGIDYFRFIAALLIVAIHTSPLFSFSETGNFIFTRIVAPVAVPFFFMTSGFFL-- 


60 


Query: 


72 


RIQQGLCNHSYFRKLIKK YSMWTIIYLPY GYFFFESLNIAKIYLLPGFIVAF 


123 






I + CN IKK Y + ++Y+P GYF ++L LP 1 




Sbjct: 


61 


-ISRYTCNAEKLGAFIKKTTLIYGVAILLYIPINVYNGYFKMDNL LPNIIKDI 


112 




124 


LYLGMSHTLWYIPAVILGWVIIQGLLKYVGTRGTFITVWLYCIGAV-ETYSVFIQSTKF 


182 






++ G 4- LWY+PA I+G I L+K V R F+ +LY IG ++Y ++S 




Sbjct: 


113 


VFDGTLYHLWYIiPASIIGaaiAVWLVKKVHYRKAFLIASILYIIGLFGDSYYGIVKSVSC 






183 


YPLMSTYMSIFQT- - -TRKGLFYTPVYLIiAGYLIiYDYFNTDLFTKSRGLK-YILFLLLLA 


238 






L Y IFQ TRNG+F+ P++ + G + D + + + K ++YLFL+ 




Sbjct: 


173 


— LNVFYJNLIFQLTDYTRNGIFFAP1FFVLGGYISD- -SPNRYRKKNYIRIYSLFCLMFG 


228 



Query: 239 LENVLIYFN-QGLDKNFFLLAP 259 

L +F+ Q D + LL P 
Sbjct: 229 KTLTLQHFDI QKHDSMYVLLLP 250 



No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8589> and protein <SEQ ID 8590> were also identified. Analysis of tl 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 7 

McG: Di scrim Score: 0.23 

GvH: Signal Score (-7.5): -5.77 
Possible site: 34 

»> Seems to have an uncleavable N- terra signal seq 

AL0M program count: 3 value: -5.68 threshold: 0.0 

INTEGRAL Likelihood = -5.68 Transmembrane 41 - 57 ( 38 - 59) 
INTEGRAL Likelihood = -3.98 Transmembrane 65 - 81 ( 65 - 82) 
INTEGRAL Likelihood = -1.33 Transmembrane 97 , 113 ( 97 - 113) 
PERIPHERAL Likelihood =5.78 10 
modified ALOM score : 1 . 64 

*** Reasoning Step: 3 



- Final Results 

bacterial membrane Certainty=0 . 3272 (Affirmative 

bacterial outside --- Certainty=0 . 0000 (Not Clear) ■ 
bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) ■ 



A related GBS gene <SEQ ID 8591> and protein <SEQ ID 8592> were also identified. Analysis of tl 
protein sequence reveals the following: 
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PCT/GB01/04789 



Lipop: Possible site: -1 Crend: 5 
McG: Discrim Score: 11.50 
GvH: Signal Score (-7.5): -2.69 
Possible site: 32 



5 


>» Seems to 
ALOM program 


have an uncleavable N 
count: 9 value: -8 


-term signal sec 
.65 threshold: 


0.0 










INTEGRAL 


Likelihood = -8.65 


Transmembrane 


310 


326 


307 


- 330) 




INTEGRAL 


Likelihood = -6.10 


Transmembrane 


9 


25 


8 


- 28) 




INTEGRAL 


Likelihood = -5.68 


Transmembrane 


221 


237 


218 


- 239) 


10 


INTEGRAL 


Likelihood = -3.98 


Transmembrane 


245 


261 


245 


- 262) 




INTEGRAL 


Likelihood = -3.56 


Transmembrane 


101 


117 


101 


- 120) 




INTEGRAL 


Likelihood = -3.19 


Transmembrane 


152 


168 


147 


- 168) 




INTEGRAL 


Likelihood = -1.97 


Transmembrane 


123 


139 


123 


- 144) 




INTEGRAL 


Likelihood = -1.33 


Transmembrane 


277 


293 


277 


- 293) 


15 


INTEGRAL 
PERIPHERAI 
modified AL 


Likelihood = -0.59 
j Likelihood = 5.78 
3M score: 2.23 


Transmembrane 
190 






43 


- 60) 



• Reasoning Step: 3 

■-- Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



• Certainty=0. 4461 (Affirmative) . 

• Certainty=0. 0000 (Not Clear) < s 

• Certainty=0. 0000 (Not Clear) < i 



The protein has homology with the following sequences in the databases: 



ORF00153 (307 - 1140 of 1632) 

GP | 7960293 | gb | AAF71283 . 1 |AF253562_7 |AF253562 (2 - 

faecalis} 

%Match =8.5 

%Identity =32.7 %Similarity - 54.0 

Matches = 91 Mismatches = 113 Conservative Sub.s 



284 of 711) 



{Enterococcus 



150 



180 



210 



240 



270 



CEISFFIS*YG**GIM^QIPFKAFQ*LFGIIEIFF*RDIWHSNDNL*IO^LI^KRSQCTONKQHNTSMISLLQYLFSI 

| : | | :::: :: 



LVILVHSGRLFS-QDVIHFTFKSFLGRMAVPYFLICTAFFLRGRIQQGLCNHSYFRKLIKK YSMWTI IYLP Y- 

!:: :|: III =. :| I = = I I H = : III I = II = =111 I : :«|s| I 
LIVAIHTSPLFSFSETGNFIFTRIVAPVAVPFFFMTSGFFL- - - ISRYTCNAEKLGAF I KKTTLI YGVAILLYI P INVYN 



GYFFFESILNIAKIYLLPGFIVAFLYLGMSHTLlWIPAVILGOTIIQGLLKYVGTRGrFITVVVLYCrGAV-ErYSVFIQS 
III = = l II I = = I = 111=11 1=1 I Ml I 1= =11 II ==l ==1 

GYFKMDNL LPNIIKDIVFDGTLYHLWYLPASIIGAAIAV^TO^HYRKAFLIASILYIIGLFGDSYYGIVKS 



TKFYPLMSTYMSIFQTT---RNGLFYTPWLLAGYLLYDYFCTDLFTKSRGLK-YILFLLLI^ENVLIYFN-QGLDKNF 
I I III I 111 = 1= 1 = = = = 1=11 =1 ==1111- |=|=|| = 

--VSCMVFYNLIFQLTDYTFJJGIFFAPIFFVLGGYISDSPNR--YRKK1WIRIYSLFCLMFGKTLTLQHFDIQKIIDSMY 



1053 1080 1110 1140 1170 1200 1230 1260 

FLLAP LCAVFL-FmSIRTSLFKEYRLSPLKQLSVYYFFLPPLFIGIVSYCLKSTSLVAHHQGKVIFVVTLALTHA 

III I ==l I II I = I 1 = III 
VLLLPSWCLENLLLHFRGKRRTGL-RTISLDQLYHSSVYDCCOTIVCTffiLLHLQ 



Based on this analysis, it was predicted that these proteins and then epitopes could be useful 
vaccines or 
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Example 515 

A DNA sequence (GBSx0553) was identified in S.agalactiae <SEQ ID 1649> which encodes the amino 
acid sequence <SEQ ID 1650>. Analysis of this protein sequence reveals the following: 

Possible site: 43 
5 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3088 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
10 bacterial outside — Certainty=0.0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
15 vaccines or diagnostics. 

Example 516 

A DNA sequence (GBSx0554) was identified in S.agalactiae <SEQ ID 1651> which encodes the amino 

acid sequence <SEQ ID 1652>. Analysis of this protein sequence reveals the following: 

Possible site: 35 
20 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 1446 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

25 bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
30 vaccines or diagnostics. 

Example 517 

A DNA sequence (GBSx0555) was identified in S.agalactiae <SEQ ID 1653> which encodes the amino 
acid sequence <SEQ ID 1654>. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 10 
35 McG: Discrim Score: 8.28 

GvH: Signal Score (-7.5): -2.11 
Possible site: 20 



40 



45 



>» Seems to have a cleavable N-term signal seq. 










ALOM program 


count: 6 value: -8 


.33 threshold: 


0.0 








INTEGRAL 


Likelihood = -8.33 


Transmembrane 


358 


374 


354 


376 


INTEGRAL 


Likelihood = -8.23 


Transmembrane 




280 


257 


290 


INTEGRAL 


Likelihood = -6.37 


Transmembrane 


210 


226 


206 


232 


INTEGRAL 


Likelihood = -5.95 


Transmembrane 


163 


179 




180 


INTEGRAL 


Likelihood = -5.10 


Transmembrane 


23 


39 


21 


40 


INTEGRAL 


Likelihood = -1.70 


Transmembrane 


297 


313 


296 


314 


PERIPHERAL 


Likelihood = 1.75 


322 










modified ALOM 


score: 2.17 













*** Reasoning Step: 3 

50 

Final Results -- 
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bacterial membrane Certainty=0. 4333 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 518 

A DNA sequence (GBSx0556) was identified in S.agalactiae <SEQ ID 1655> which encodes the amino 
acid sequence <SEQ ID 1656>. This protein is predicted to be ABC transporter (ATP-bindingprot). 
Analysis of this protein sequence reveals the following: 

d N- terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 1510 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10199> which encodes amino acid sequence <SEQ ID 
1020O was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB88481 GB:AL353816 putative ABC transport system ATP-binding 
protein [Streptomyces coelicolor A3 (2) ] 
Identities = 104/284 (35%) , Positives = 15S/284 (55%) , Gaps = 18/284 (6%) 

Query: 6 TMLLQLDNITKSYGKKIVLNQISYQFTPGLYGLLGANGTGICrTLLNLMSHFTLADSGNIY 65 

T + ++ YG+ L+ +S + TPG+ GLLG NG GKTTLIi +++ AD G 
Sbjct: 2 TPTVSASGLSLHYGRTRALDDVSLRLTPGVTGLLGPNGAGKTTLLRVIATAVPADRGAFT 61 

Query: 65 WNGQEQS EEFYRHIGFLPQHFRYYDQFTGIAFLNYIATLKGV-DKKKAKQEIPRL 119 

G + +E R +G+LPQ ++ FT F++Y+A LK + D+++ +E+ R+ 

Sbjct: 62 VLGHDPGSSRGRQEVRRRLGYLPQTPGFHPDFTAFEFVDYVAILKELADRRERHREVRRV 121 

Query: 120 LELVGLGDVGKKKISSYSGGMKQRLGIAQALINDPEILILDEPTVGLDPKERVKFRHILS 179 

LE V LG+V ++I SGGM+QR+ +A AL-t- DP L+LDEPTVGLDP++R++FR +++ 
Sbjct: 122 LEEVDLGEWGRRIKKLSGG^QRVALAAALVGDPC-FLVLDEPTVGLDPEQRMRFRELIA 181 

Query: 180 QLSTNKIIILSTHIVSDVEAVAKEIIVLKNGKFIEHGNTAQLLKTIEGKVWEIT-TEPGL 238 

+ ++LSTH DV + +IV+ G G A+L G+VW T +PG 

Sbjct: 182 GAGEGRTVLLSTHQTEDVAMLCHRVIVMAAGAVRFDGTPAELTARAAGRVWSSTEKDPG- 240 

Query: 239 SQIPNIAIVNEKVFSDSRVFRWSDICPSDSAQLWPTLEDFYI 282 

A + +SFRVDP A+ PTLED Y+ 
Sbjct: 241 AKAGWRTGTGS - - FRNVGD- - PPPGAEPAEPTLEDGYL 274 

There is also homology to SEQ ID 686. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
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Example 519 

A DNA sequence (GBSx0557) was identified in S.agalactiae <SEQ ID 1657> which encodes the amino 
acid sequence <SEQ ID 1658>. This protein is predicted to be response regulator. Analysis of this protein 
sequence reveals the following: 

d N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3781 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAC10170 GB:AJ278301 response regulator [Streptococcus pneumoniae] 
Identities = 135/242 (56%), Positives = 1S3/242 (75%) 

Query: 1 IWIFILEDDFVQQAHFEKIIKEIRVQYNLHFK'rVETPAKPVQLLESIYEIGLHNLFFLDI 60 

M IF+LEDDF QQ E I+++ ++++ + E F KP QLL ++E G H LFFLDI 
Sbjct: 1 MRIFVLEDDFSQQTRIETT1EKLLKEHHITLSSFEVFGKPDQLLAEVHEKGAHQLFFLDI 60 



Query: 121 RIEEVLLYVDGICNKPLVENSFYFKSRYSQVQLPFKDLLYIETSSR3HRWLYTEKDRME 180 

RIE LLY + +K L E+ FYFKS+++Q Q PF ++ Y+ETS R HRV+LYT+ DR+E 
Sbjct: 121 RIETALLYANSQDSKSLAEDCFYFKSKFAQFQYPFKEVYYLETSPRPHRVILYTKTDRLE 180 

Query: 181 FTATLGDILKQEPRLFQCHRSFLVNPLNIFKVDRIDRLVYFQNGTTCLVSRNKVRDIVSI 240 

FTA+L ++ KQEPRL QCHRSFL+NP N+ +D+ ++L++F NG +CL++R KVR++ 
Sbjct: 181 FTASLEEVFKQEPRLLQCHRSFLINPANVVHIJDKKEKLLFFPNGGSCLIARYKVREVSEA 240 

Query: 241 VD 242 

Sbjct: 241 IN 242 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1659> which encodes the amino acid 
sequence <SEQ ID 1660>. Analysis of this protein sequence reveals the following: 

Possible site: 44 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2098 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 106/235 (45%) , Positives = 159/235 (67%) 

Query: 1 MNIFILEDDFVQQAHFEKIIKEIRVQYNLHFKTVETFAKPVQLLESIYEIGLHNLFFLDI 60 

MNIFILEDDF+QQ E 1+ I + + +E F+ P +L ESI E G H L+FLDI 
Sbj ct : 2 MNIFILEDDFIQQTRIESIWGILKETRIPCNQI1EVFSTPQKI1FESIQERGDHQLYFLDI 61 

Query: 61 EIKNDEQMGLEVAKQIRQVDPYAQIVFVTTHSELMPLTFRYQVSALDYIDKGLSQEEFSQ 120 

EI + GLE+A IRQ DP A IVFVTTHSE P++F+Y+VSALD+IDK Q++F + 
Sbjct: 62 EIGEYTRCGLELAAAIRQKDPNAVIVFVTTHSEFAPISFKYKVSALDFIDKAGGQKQFKE 121 

Query: 121 RIEEVLLYVDGICNKPLVENSFYFKSRYSQVQLPF10DLLYIETSSRSHRWLYTEKDRME 180 

+ IEE + Y + + ++ F F++ ++++LP+ D+LY T++ H+V L+T+ +R+E 
Sbjct: 122 QIEECIRYTYDMMSSRESKDMFLFETPQTRLKLPYKDILYFATATTPHKVCLWTQTERLE 181 
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Query: 181 FTATLGDILKQEPRIiFQCHRSFLVNPLNIFKVDRIDRLlTFQNGTTCLVSRNICVR 235 

F L +1 P+LF CHRS+LVN + ++D+ +L+YF+NG +C+VSR K++ 
Sbjct: 182 FYGNLSEIQAVAPKLFLCHRSYLVNLDK^'VRIDKSKQLLYFENGDSCMVSRLKMK 236 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 520 

A DNA sequence (GBSx0558) was identified in S.agalactiae <SEQ ID 1661> which encodes the amino 
acid sequence <SEQ ID 1662>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm — - Certainty=0 .2651 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1663> which encodes the amino acid 
sequence <SEQ ID 1664>. Analysis of this protein sequence reveals the following: 

D N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 . 0535 (Affirmative) < suco 
bacterial membrane — certainty=0 . 0000 (Not Clear) < suco 
bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 
Identities = 177/269 (65%) , Positives = 219/269 (80%) 

Sbjct: 1 

Query: 66 GTPALHQDNFALQL VHYLNLQGLHYHWTWAYNHIGYSKYHEGVAI LSLKPLKPED I LVSA' 125 

GTP++H+D+FAL L+HYL +G HY+W+WAYNHIGY Y EGVAILS +P+ DILVSA 
Sbjct: 61 GTPSIHKDHFALLLIHYLQKRGQHYYWSWAYNHIGYDIYQEGVAILSKQPIOTSDILVSA 120 

Query: 126 VDDETDYHTRRALVAETTIjNDKVVT^/VSLHFSWFEKGFAEEWKRLETTLLEVETPLLLMG 185 

+DDETDYHTRR+L+A+TTL+ K V W++H SWF+KGF EW++LE LL + PLLLMG 
Sbjct: 121 mDETDYHTRRSLIAKTTLDGKEVAVVN^raLSWFDKGFLGEWEKLEKELLTMCPLLLMG 180 

Query: 186 DFl^PTGNQGYELv^^SPIALKI)SHQIAlraVFGDHTl^W)IDGWEGNKKALKVDHIFTSE 245 

DFNNPT GY++++ SPL L+DSH+ A+HVFGDH+I+ADIDGW+GNK+ALKVDH+FTS+ 
Sbjct: 181 DFNNPTDQDGYQVMMGSPLDLQDSHKGADHVFGDHSIVADIDGWQGNKEALKVDHVFTSK 240 

Query: 246 DLSISSSQWFEGGEAPWSDHYGLEITM 274 

D I SS++ FEGG+APWSDHYGLE+T+ 
Sbjct: 241 DFIIRSSKITFEGGDAPWSDHYGLEVTL 269 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 521 

A DNA sequence (GBSx0559) was identified in S.agalactiae <SEQ ID 1665> which encodes the amino 
acid sequence <SEQ ID 1666>. This protein is predicted to be PTS system, glucose-specific enzyme II, A 
component (ptsG). Analysis of this protein sequence reveals the f< 

Possible site: 37 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



Transmembrane 28 - 



Transmembrane 153 - 169 



- Certainty=0. 4227 (Affirmative) ■ 

- Certainty=0. 0000 (Not Clear) < i 

- Certainty=0. 0000 (Not Clear) < i 



) ID 10201> which encodes amino acid sequence <SEQ ID 



A related GBS nucleic acid sequence <S1 
10202> was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 



Query: 293 DLINLKGS-NSSQYHHLLTSWPARFKVGQMIGASGILMGLSYMIYRNVDKDI<KLKYKSM 351 

DLI+LKG+ + SQYHHLLTSVTPARFKVGQMIG+SGILMGL+ AMYRNVD DKK KYK M 
Sbjct: 3 DLIHLKGAGHMSQYHHLLTSVTPARFKVGQMIGSSGILMGLTLAMYRNVDPDKKEKYKGM 62 

Query: 352 FISAAAATFLTGVTEEIEYMFMFAAMPLYLWAWC^KIAFAMADIVNLRVHSFGNIEFLT 411 

F+SAA A FLTGVTEP+EYMFMFAA+PLYLVYAWQG AFA AD+++LRVHSFGNIEFLT 
Sbjct: 63 FLSAAVAVFLTGVTEPLEYMFMFAALPLYLVYAWQGLAFASADLIHLRVHSFGNIEFLT 122 

Query: 412 RVPMGIKAGLGGDIFNFVWTLLFAVljMYFIANFMIKKFNLATAGRNGNYDNEEVDNAPS 471 

+ PM IKAGL DI NF4- V+++F V MYFI WFMI KKFNLAT+GRNGNYD + D + 
Sbjct: 123 KTPMAIKAGLAT/IDIWFIWSWFGVAMYFITNFMIKKFNLATSGRNGNYDTGD-DASDE 181 

Query: 472 TAS GSADANSQWQVINLLGGRDNIEDTOACMTRLRVTVKDGNSVGSEAAWKKAGA 527 

TAS G+A+ANSQ+V++INLLGG++NI DVDACMTRLR+TV D VG EAAWKKAGA 
Sbjct: 182 TASNSNAGTANANSQIVKIINLLGGKENISDTOACMTRLRITVTDVAKVGDEAAWI<KAGA 241 

Query: 528 MGLVLKGNGVQAIYGPKADVLKSDIQDLLDSGTVIPIVDLETGQPVAAAPVTTYKGITEE 587 

MGL++KGNGVQA+YGPKADVLKSDIQDLLDSG IP D+ + A V ++KG+TEE 
Sbjct: 242 MGUVKGNGVQAVYGPKADVLKSDIQDLLDSGVDIPKTDVTAPEEDKTADV-SFKGVTEE 300 

Query: 588 IVSVANGQVEALDVVTOPVFSQKMMGDGFAVEPTDGNIYVPVSGTVTSVFPTKHAFGLLT 647 

+ +VA+GQV + V DPVFSQKMMGDGFAVEP +GNIY PV+G VTSVFPTKHA GLLT 
Sbjct: 301 VATVADGQVLPITQVHDPVFSQKMMGDGFAVEPENGNIYSPVAGLVTSVFPTKHALGLLT 360 

Query: 648 ESGLEVLVHIGLDTVALDGQPFEWISSGQKWAGDLAWADLEAIKAA 696 

+ GLEVLVH+GLDTVAL+G PF K+ GQ+V GDL +VADLEAIK+A 
Sbjct: 361 DDGLEVLVHVGLDTVALNGAPFSAKVKDGQRVALGDLLLVADLEAIKSA 409 



A related DNA sequence was identified in S.pyogenes <SEQ ID 1667> which encodes the amino acid 
sequence <SEQ ID 1668>. Analysis of this protein sequence reveals the following: 



Possible site: 33 
•> Seems to have a cleavable N-term signal seq. 
INTEGRAL Likelihood =-13.43 Transmembrane 186 - 202 ( 181 - 
INTEGRAL Likelihood = -6.79 Transmembrane 419 - 435 ( 412 - 
INTEGRAL Likelihood = -5.52 Transmembrane 61 - 77 ( 57 - 
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INTEGRAL Likelihood = 
INTEGRAL Likelihood = 
Likelihood = 



-3.56 
-1.97 
-0.16 



Transmembrane 363 ■ 



Transmembrane 343 



379 ( 363 - 381) 
159 ( 142 - 160) 
359 ( 343 - 359) 



- Final Results 

bacterial membrane - 

bacterial outside - 

bacterial cytoplasm - 



- Certainty=0. 6371 (Affirmative) . 
■ Certainty=0. 0000 (Not Clear) < i 

- Certainty=0.0000(Not Clear) < i 



10 The protein has homology with the following sequences in the databases: 

>GP:AAD00231 GB:D78S00 putative ptsG protein [Streptococcus mutans] 
Identities = 288/407 (70%), Positives = 331/407 (80%), Gaps = 2/407 (0%) 

Query: 286 DLTOLKGSD-ASAYSHLMDSVTPARFKVGQMIGATGTmGVALA^mMVDADKKHTYKMM 344 
15 DL+HLKG+ S Y HL+ SVTPARFKVGQMIG++G LMG+ LAMYRNVD DKK YK M 

Sbjct: 3 DLIHLKGAGHMSQYHHLLTSVTPARFKVGQMIGSSGILMGLTIjAMYRNVDPDKKEKYKGM 62 

Query: 345 FISAAAAVFLTGVTEPLEYLFMFAAMPLYIVYALVCGASFAMADLVNLRVHSFGNIELLT 404 
F+SAA AVFLTGVTEPLEY+FMFAA+PLY+VYA-4-VQG +FA ADL++LRVHSFGNIE LT 
20 Sbjct: 63 FLSAAVAVFLTGVTEPLEYMFMFAALPLYLVYAWQGLAFASADLIHLRVHSFGNIEFLT 122 



Query: 405 RTPMALKAGLGMDVINFVWVSVLFAVimFIADMMIKKMHLATAGRLGNYDA-DILGDRN 463 

+TPMA+KAGL MD+4-NF+ VSV+F V MYFI + MIKK +LAT+GR GNYD D D 
Sbjct: 123 KTPMAIKAGLAMDIWFIWSWFGVA^FITlIFMIKiCFNLATSGRNGNYDTGDDASDET 182 

Query: 464 TQTRPTQVADSNSQWQIWLLGGAGNIDDVDACMTRLRVTVKDPAKVGAEDDWKKAGAI 523 

A+ +NSQ+V+ 1 +NLLGG NI DVDACMTRLR+TV D AKVG E WKKAGA+ 
Sbjct: 183 ASNSNAGTANANSQIVKIimLGGICF^ISDVnACMTRLRITVTDVAKVGDFJVAWKKAGAM 242 

Query: 524 GLIQKGNGVQAVYGPKADILKSDIQDLLDSGALIPEVNMSQLTSKPTPAKDFKHVTEDVL 583 

GLI KGNGVQAVYGPKAD+LKSDIQDLLDSG IP+ +++ T FK VTE+V 

Sbjct: 243 GLIVKGNGVQAWGPJ?ADVLKSDIQDLLDSGVDIPKTDVTAPHEDKTADVSFKGVTEEVA 302 

Query: 584 SVADGMVLPITGVKDQVFAAKMMGDGFAVEOTHGNIYAPVAGLVTSVFPTKHAFGLLTDN 643 

+VADG VLPIT V D VF+ KMMGDGFAVEP +GNIY+PVAGLVTSVFPTKHA GLLTD+ 
Sbjct: 303 TVADGQVLPITQVHDPVFSQiWIGDGFATOPEKGNIYSPVAGLVTSVFPTKHALGIjLTDD 362 



Query: 644 GLEVIjVHVGLDTVALNGVPFSVKVSEGQRVHAGDLLvVADLAAIKSA 690 

GLEvLVHVGLDTVALNG PFS KV +GQRV GDLL+VADL AIKSA 
Sbjct: 363 GLEVLVHVGLDTVALNGAPFSAKVKDGQRVALGDLLLVADLEAIKSA 409 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 517/731 (70%) , Positives = 606/731 (82%) , Gaps = 7/731 (0%) 

Query: 8 MKNOTKQLFSFEFWQKFGKALMVVIAVMPAAGLI4VSIGNSISLLDPSNVLLGRIANVIAQ 67 

MK + KQLF FEFWQKFGK LMWIAVMPAAGLM+SIGNSI +++ + L + N+IAQ 
Sbjct: 1 MKTSFKQLFRFEFWQKFGKCLMWIAVMPAAGLMISIGNSIPMINHDSAFLASLGNIIAQ 60 

Query: 68 IGWGVIGKLHILFALAIGGSWAKERAGGAFAAGLSFILINLITGNFFGVKTDMLADSKAT 127 

IGW VI KLH+LFALAIGGSWAKERAGGAFA+GL+F+LIN ITG F+GV + MLAD +A 
Sbjct: 61 IGWAVIVNLHLLFAIAIGGSWAKERAGGAFASGLAFVLINRITGAFYGVSSTMLADPEAK 120 



Query: 128 VQTVFGATIRVSDYFVNVLGQPALNMGVFVGI I SGFVGATAFNKYYNYRKLPDALTFFNG 187 
+ ++ G + V DYF +VL PALN GVFVGI I +GFVGATA+NKYYNYRKLP+ LTFFNG 
55 Sbjct: 121 ITSLLGTQMIVKDYFTSVLESPALNTGVFVGI IAGFVGATA.YNKYYNYRKLPEVLTFFNG 180 



Query: 188 KRFVPFWIYRSVIVALILSVFKPWQSGINGFGKWIASSQDSAPILAPFVYGTLERLLL 247 

KRFVPFWI RS+ VALIL V WPV+QSGIN FG WIASSQDSAPILAPF+YGTLERLLL 
Sbjct: 181 KRFVPFWILRSIFVALILVWWPVIQSGINSFGMWIASSQDSAPILAPFLYGTLERLLL 240 

Query: 248 PFGLHHMLTIPMJSTYTQLGGTYTVLTGATKGAQVLGQDPLWIAWVGDLINLKGSNSSQYHH 307 

PFGLHHMLTI PMNYT LGGTY V+TGA G +V GQDPLWLAWV DD++LKGS++S Y H 
Sbjct: 241 PFGIfiHMLTIPMNYTALGGTYETOTGiAAAGTWFGQDPLWIiAJTODLVHLKGSDASAYSH 3 00 

Query: 
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Sbjct: 301 IOTSOTPARFKTCQMI<^TGTI«V^^ 360 

Query: 3S8 IEYMFMFAAMPtYIiWAWQGCWAMftDIVNLRVHSFGNIEFLTRVPMGIKftGLGGDIFN 427 

+EY+FMFAAMPLY+VYA+VQG +FAMAD+VNLRVHSFGNIE LTR PM +KAGLG D+ N 
Sbjct: 3S1 LEYLFMFAAMPLYIVYALVQGASFAMADLVKLRVHSFGNIELLTRTPMALKAGLGMDVIN 420 

Query: 428 FVWVTLLFAVlireFIANFMIKKFNLATAGRNGK\ r DNEEVD--NAPSTASGSADANSQVVQ 485 

FVWV++LFAV+MYFIA+ MIKK +LATAGR GNYD + + N + + AD+NSQWQ 
Sbjct: 421 FVWVSVLFAVIMYFIADMMIKHvlHIATAGRLGNYDADILGDRNTQTRPTQVADSNSQVVQ 480 

Query: 486 VII^LGGRDNIEDVDACMTRLRVTVKDGNSVGSFAAWKKaGANiGIjVLKGMGVQAIYGPKA 545 

++NLLGG NI+DVDACMTRLRVTVKD VG+E WKKAGA+GL+ KGNGVQA+YGPKA 
Sbjct: 481 IVNLLGGAGNIDDVDACMTRLRVTVKDPAKVGAEDDWKXRGAIGLIQKGNGVQAVYGPKA 54D 

Query: 546 DVLKSDIQDLLDSGTVIPIVDLE- -TGQPVAA&PVTTYKGITEEIVSVANGQVEALDWK 603 

D-t-LKSDIQDLLDSG +IP V++ T +P P +K +TE+++SVA+G V + VK 
Sbjct: 541 DILKSDIQDLLDSGALIPEVNMSQLTSKP TPAKDFKHVTEDVLSVADGMVLPITGVK 597 

Query: 604 DPVFSQKMMGDSFAVEPTDGNIYVPVSGTVTSVFPTKHAFGLLTESGLEVLVHIGLDTVA 663 

D VF+ KMMGDGFAVEPT GNIY PV+G VTSVFPTKHAFGLLT+ +GLEVLVH+GLDTVA 
Sbjct: 598 DQVFAAKMMGDGFAVEPTHGNIYAPVAGLVTSVFPTKHAFGLLTDNGLEVLVHVGLDTVA 657 

Query: 664 LDGQPFEVKlSSGQKWAGDtAWADLEAIKARGKETSVIIVFTHVSDIKTVKLEKSGPQ 723 

L+G PF VK+S GQ+V AGDL WADL AIK+A +ET +++ FTN V L G Q 

Sbjct: 658 LNGVPFSVKVSEGQRVHAGDLLWADrAAIKSAERETIIWAFTNTTEIQDVTLTSLGAQ 717 

Query: 724 IAKTWAKVEL 734 

AKT VA VEL 
Sbjct: 718 PAKTKVATVEL 728 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 522 

A DNA sequence (GBSx0560) was identified in S.agalactiae <SEQ ID 1669> which encodes the amino 
acid sequence <SEQ ID 1670>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 2266 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 



Example 523 

A DNA sequence (GBSx0561) was identified in S.agalactiae <SEQ ID 1671> which encodes the amino 
50 acid sequence <SEQ ID 1672>. This protein is predicted to be alkaline phosphatase synthesis sensor protein 
phor (hpkA). Analysis of this protein sequence reveals the following: 

Possible site: 34 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-13.96 Transmembrane 160 - 176 ( 148 - 183) 
55 INTEGRAL Likelihood = -8.65 Transmembrane 20 - 36 ( 13 - 41) 
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Final Results 

bacterial membrane Certainty=0. 6583 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

5 bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8595> which encodes amino acid sequence <SEQ ID 8596> 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 6 
10 SRCFLG : 0 

McG : Length of UR: 26 

Peak Value of UR: 3.27 
Net Charge of CR: 3 
McG: Discrim Score: 14.63 
15 GvH: Signal Score (-7.5): -5.64 

Possible site; 26 
>» Seems to have an uncleavable N-term signal seq 
Amino Acid Composition: calculated from 1 
ALOM program count: 2 value: -13.96 threshold: 0.0 
20 INTEGRAL Likelihood =-13.96 Transmembrane 152 - 168 ( 140 - 175) 

INTEGRAL Likelihood = -8.65 Transmembrane 12 - 28 ( 5 - 33) 
PERIPHERAL Likelihood = 1.59 135 
modified ALOM score: 3.29 
icml HYPID: 7 CFP: 0.658 

25 



Final Results 

bacterial membrane — Certainty=0. 6583 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

A related GBS gene <SEQ ID 8593> and protein <SEQ ID 8594> were also identified. Analysis of tl 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 6 

McG: Discrim Score: 14.63 

GvH: Signal Score (-7.5): -5.64 
Possible site: 26 

>» Seems to have an uncleavable N-term signal seq 

ALOM program count: 2 value: -13.96 threshold: 0.0 

INTEGRAL Likelihood =-13.96 Transmembrane 152 - 168 ( 140 - 175) 
INTEGRAL Likelihood = -8.65 Transmembrane 12 - 28 ( 5 - 33) 
PERIPHERAL Likelihood = 1.59 135 
modified ALOM score: 3.29 

*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0. 6583 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

34.9/61.1% over 363aa 

Thermotoga maritima 

EGAD| 131465 | sensor histidine kinase HpkA Insert characterized 
GP 1 1575578 |gb|AAC44437. 1 1 |U67196 histidine protein kinase Insert characterised 
Gpj 4982228 jgb jAAD36721 . 1 j AE001807_12 |AE001807 sensor histidine kinase Hpkft Insert 
characterized 

PIR|C72228|C72228 sensor histidine kinase HpkA - (strain MSB8) Insert characterized 



ORF00680(919 - 1977 of 2277) 
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EGAD|l31465|TM1654 (48 - 411 of 412) sensor histidine kinase HpkA {Thermotoga maritima} 
GP|l575578|gb|AAC44437.l| [U67196 histidine protein kinase {Thermotoga maritima} 
GP|4982228|gb|AAD36721.l|AE001807_12|AE001807 sensor histidine kinase HpkA {Thermotoga 
maritima} PIR| C72228 | C72228 sensor histidine kinase HpkA - Thermotoga maritima (strain 
5 MSB 8) 

%Match =13.6 

%Identity =34.8 %Similarity =61.0 

Matches =125 Mismatches = 134 Conservative Sub.s = 94 

10 720 750 780 810 840 870 900 930 

AAQRIOTGTIWLSVAQQTIFYLLLGMISPIAIIILI^IILSVLIARYIAKKVSEPMNIDLDHPLSNDSYEEITPLLRR 
: =: :| 11= |::|| 1= = = hi 

MSVFLFVIVAVLFVLLFLVFKKRLiSEYKILIEKLSDMLGEKGVPPLYLFER 
10 20 30 40 50 

15 

960 990 1020 1050 1080 1110 1140 1170 

LDSHQAKIQHQKLIjLQKRQKEFDTI1SKIKEGMILLDDQARIVSIN7\EMICLFQIM5DWHGRFMMEVSRDLTLKDLIDQG 
I = .. =: : I 11 = = = I = = I I = = I = I I = I I =11 1 = I = = = = = = 

LHCYVDNLKETISRVEVSRDNFLTILNSLSEPIFILDREGKITFLlffilARELVQGRINPEGRPYYEIFEDYYIISrEMTO 
20 70 80 90 100 110 120 130 

1197 1215 1245 1275 1305 1335 1365 1395 

LKGKK- KEAN IGIENNHYRVLWPTTDNWVTGLWLLFDVTDQLQMEQLQREFTANVSHELKTPLHVISGYSELL 

= 1 == =1 =1 I == I I I : :| = |: III = = = - = = 111 I 11111 = 111 I IM I 

25 IKSEEPQEGTLVTYVGNEKKYFHVKVI PVELKSGDKI FVILFHDVTKERKLDEMRREFIATVSHELRTPLTS1HGYAETL 

150 160 170 180 190 200 210 

1452 1482 1512 1539 1569 1599 1629 

ANQMVPNEE-VPQFAAKIHKESERLVKLVEDIINLSHLDEQE-KLPQETVI&YDLTQKVLEGLQAKADKKHIQINFNGEE 
30 = |=| | =| | =|| |= =|= |:==| ==| | = = |=| == = | =| |== = |= 

LEDDLENKELvTO^FLKIIEEESARMTRLllTOLLDLEKIEESEANFEMKDVDLCEVIEYVYRIIQPIAEENEVDLIVECED 
230 240 250 260 270 280 290 

1659 1639 1713 1767 1797 1827 1857 

35 AILRGNPVLUSTSLVYNLCDNAITYNH- -EKGQVNVTLK- -NSPDTITLEVSDTGLGIAEKDKKRIFERFYRVDKSRSKIV 

"III. I == II 111= I 111= I == "II ■■ =11 III II «» « 1111 = 111111 = 11= = 
VVVRGNKERLIQMLLiNnLVDNAVKYTSLKEKGEKKVWVRAYDTPDWVVVEVEDTGPGI PKEAQSRI FEKFYRVDKARSRKM 
310 320 330 340 350 360 370 

40 1887 1917 1947 1977 2007 2037 2067 2097 

GGTGLGkSIWSALDFHNGSiraroSHLGC£TTMTVLL^ 
1111111=111= =1 I I 1=1=1 = III I III h 
GGTGLGLTIVKT1VDKHGGKIEVESEINQGTLMRVLDPKRR 
390 400 410 

45 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB06875 GB:AP001517 two-component sensor histidine kinase 

involved in phosphate regulation [Bacillus halodurans] 
Identities = 176/589 (29%) , Positives = 315/589 (52%) , Gaps = 47/589 (7%) 

50 







9 












MTK +R L+ ++ VT+L++ G +L N + +++E + + + 






Sbjct: 


1 


MTKFRYRLVLA VLTVTLL VMAGLGIiVIGQI FKWVYLENLTDRLiKKETYLAASMVEN 




55 




57 


QGI S F - EGKDYFENLKTS -NWITWVDNKGQVLYDTQSDAKHMKNHANRQE I KEAI KSGY 


114 








+ + FE+ E+ + R+T + G V+ ++ +D M+NHA+R E E ++ G 






Sbjct: 


57 


EA VLENEVQTLTEE I SQKLDARVT I ILADGTWGESAADP7AEMENH7ADRPEFTE - LEEGI 


115 






115 


GESTRWSATL-TEKSIYAAQRLN- -N0TI- -VRLSVAQQTIFYLLLGMISPLAIIILLAI 


169 


60 






R+S T+ TE YA N N TI VRL + ++ + + + L+ +A 






Sbjct: 


116 


- - -TOYSTTVETELLFYAVPIQNEANETIGYVRLGLPIEAvNSVNRTLWAILIVSFTIAF 


172 




Query: 


170 


ILSVLIARYIAKKVSEPUSINI DLDHPLSNDSYEEITPLLRRLDSHQAKIQ 


219 








++ V + IA ++ P+ + D S +S +E+ Ii R ++ ++ 




65 


Sbjct: 


173 


LVIVSVTYRIANQMIRPIESATVATANKIAEGDYQARTSEESRDEVGQI^SINVIAYNLE 


232 




Query: 


220 


HQKLLLQKRQKEFDTIISKIKEGMILMDQARIVSINREALKLFQINDD-WHGRFMMEVS 


278 
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Sbjct: 


233 


Query: 


279 


Sbjct: 


293 


Query: 


333 


Sbjct: 


353 


Query: 


392 


Sbjct: 


413 






Sbjct: 


473 




510 


Sbjct: 


533 



++EQ++++F ANVSHELKTP+ I G++E h + + +E++ QF I KESERL 



+V+ L+ KA++K I 1+ + E + L G+ 



RVD++RS+ GGTGLGL+IVK ++ H G I V+S G+GTT T+ H+ 
RVDRARSRNSGGTGLGIAIVKHLVEAHQGKILVESEFGKGTTFTIQFHR 581 

There is also homology to SEQ ID 1 178. 

SEQ ID 8594 (GBS340) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 173 (lane 10; MW 86kDa). It was also expressed in E.coli as a His-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 1 1 (lane 7; MW 61.5kDa) and in Figure 
77 (lane 10; MW 62kDa). 

Purified GBS340-GST is shown in Figure 223, lane 2; purified GBS340-His is shown in Fig. 191, lane 9. 

The purified GBS340-GST fusion product was used to immunise mice. The resulting antiserum was used 
for Western blot (Figure 254A), FACS (Figure 254B), and in the in vivo passive protection assay (Table 
III). These tests confirm that the protein is immunoaccessible on GBS bacteria and that it is an effective 
protective immunogen. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 524 

A DNA sequence (GBSx0562) was identified in S.agalactiae <SEQ ID 1673> which encodes the amino 
acid sequence <SEQ ID 1674>. This protein is predicted to be phosphate regulon transcriptional regulatory 
protein phob (phoB). Analysis of this protein sequence reveals the following: 

o N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2617 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0.0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10203> which encodes amino acid sequence <SEQ ID 
10204> was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC73502 GB:AE000146 positive response regulator for pho 
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regulon, sensor is PhoR (or CreC) [Escherichia coli K12] 
Identities = 98/224 (43%), Positives = 138/224 (60%), Gaps = 2/224 (0%) 



I VED+A IREM+ + Ii+ GF+ + + E PDLILLD MLPG G+ 



Query: 


2 


Sbjct: 


5 


Query: 


62 


Sbjct: 


65 


Query: 


122 


Sbjct: 


125 






Sbjct: 


185 



L +DPT++ V G E + + EF+LL F +P 



There is also homology to SEQ ID 1182. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 525 

A DNA sequence (GBSx0563) was identified in S.agalactiae <SEQ ID 1675> which encodes the amino 
25 acid sequence <SEQ ID 1676>. This protein is predicted to be phosphate transport system regulatory 
protein (phoU). Analysis of this protein sequence reveals the following: 

Possible site: 33 

■»> Seems to have no N-terminal signal sequence 

30 Final Results 

bacterial cytoplasm --- Certainty=0. 1188 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

35 The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAG08750 GB:AE004948 phosphate uptake regulatory protein PhoU 
[Pseudomonas aeruginosa] 
Identities = 66/213 (30%) , Positives = 119/213 (54%), Gaps = 4/213 (1%) 

40 Query: 2 IRSRFASQLNDLNKEI I FMGALCEDI IGKSLGALTNSMDVYLDDI SETYHKI EQMERD IE 61 

I +F ++L D+ ++ MG h E + ++ AL +++ + E +1 QMER+I+ 

Sbjct: 11 ISQQFNAELEDTOSHLLAMGGLWKQVNDAWALIDADSGLAQQVREIDDQINQMERNID 70 

Query: 62 ERCLKLLLRQQPVAKDLRRISSALKMVYDMKRIGAQAYEIAEIVSLGHIIQGSGSERD-- 119 
45 E C+++L R+QP A DLR I S K V D++RIG +A ++A + + S R 

Sbjct: 71 EECWIIARRQPAASDLRLI ISISKSVIDLERIGDEASKVARRAI - -QLCEEGESPRGYV 128 

Query: 120 QI^SMSNNVISMLTKSIDAFIYDI<IEEQAHQVIEQDRT\'NQEFDTIKKQLVLYFSVQDVDG 179 
++ + + V M+ +++DAF + + A V + D+TV++E+ T ++LV Y 
50 Sbjct: 129 EVRHIGSQVQKMVQEALDAFARFDADLMLSVAQVDKTvDREYKTALRELVTYMMEDPRAI 188 

Query: 180 EYPIDVLMIAKYLERIGDHTWIAKWVLFSITG 212 

++++ + LERIGDH NIA+ V++ + G 
Sbjct: 189 SRVLNIIWALRSLERIGDHARNIAELVIYLVRG 221 

55 

There is also homology to SEQ ID 1678. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 526 

A DNA sequence (GBSx0564) was identified in S.agalactiae <SEQ ID 1679> which encodes the amino 
acid sequence <SEQ ID 1 680>. This protein is predicted to be ATP-binding cassette protein PstB (pstB-2). 
Analysis of this protein sequence reveals the following: 



■ Final Results 

bacterial cytoplasm - 
bacterial membrane - 
bacterial outside - 



• Certainty=0.2< 

• Certainty=0.0( 

• Certainty=0.0( 



2 (Affirmative! 
0(Not Clear) • 
0(Not Clear) . 



A related GBS nucleic acid sequence <SEQ ID 10205> which encodes amino acid sequence <SEQ ID 
10206> was also identified. 

15 The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAD22041 GB:AF118229 ATP-binding cassette protein PstB 
[Streptococcus pneumoniae] 
Identities = 166/245 (67%) , Positives = 211/245 (85%) , Gaps = 1/245 (0%) 



Query: 


10 


INNLDLYYGEFHALKDWLDIEEKEITAFIGPSGCGKSTLLKSINRMNDLVKNCKITGDI 


69 






+ +LDL+YG+F ALK++++ + E++ITA IGPSGCGKST LK++NRMNDLV +C I G + 




Sbjct: 


6 


WHLDLFYGDFQALKNISIQLPERQITALIGPSGCGKBTFLKTLNRMTOLVPSCHIEGQV 


65 


Query: 


70 


TLEGEDVYR-QLDINQLRKKVG^IVFQKPNPFPMSIYDNVAFGPRTHGIHSKAELDDIVER 


128 






L + +D+Y + ++NQLRK+VGMVFQ+PNPF MSIYDNVA+GPRTHGI K +LD +VE4 




Sbjct: 


66 


LLDEQDIYSSKFNLNQLRKRVGMVFQQPNPFAMSIYDNVAYGPRTHGIRDKKQLDALVEK 


125 


Query: 


129 


SLKQAALWDEVKDRLHKSALGMSGGQQQRLCIARA1AIEPDVLLMDEPTSALDPISTAKI 


188 






SLK AA+W+EVKD L KSA+ +SGGQQQRLCIARALA+EPD+LLMDEPTSALDPIST KI 




Sbjct: 


126 


SLKGAAlWEEVKDDLKKSAMSLSGGQQQRLCIARAIiAVEPDILLMDEPTSALDPISTLKI 


1S5 


Query: 


189 


EELVIQLKKIiyTIVIWHNMQQAWISDKTAFFLMGEVVEYNKTSQLFSLPQDERTENYI 


248 






E+L+ QLKK+YTI + I VTHNMQQA RISDKTAFFL GE+ E+ T +F+ P+D+RTE+YI 




Sbjct: 


186 


EDLIQQLKKDYTI I IVTHNMQQASRISDKTAFFLTGEICEFGDTVDVFTNPKDQRTEDYI 


245 




249 


TGRFG 253 








+GRFG 




Sbjct: 


246 


SGRFG 250 





There is also homology to SEQ ID 1682. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 527 

A DNA sequence (GBSx0565) was identified in S.agalactiae <SEQ ID 1683> which encodes the amino 
acid sequence <SEQ ID 1684>. This protein is predicted to be transmembrane protein PstA (pstA-2). 
Analysis of this protein sequence reveals the following: 

Possible site: 38 





have a cleavable 1 


I-term siqnal seq. 










INTEGRAL 


Likelihood =-13 


11 


Transmembrane 


265 


281 


255 


286 


INTEGRAL 


Likelihood = -8 




Transmembrane 


79 


95 


68 


100 




Likelihood = -4 


78 


Transmembrane 


195 


211 




213 


INTEGRAL 


Likelihood = -4 


S7 


Transmembrane 


147 


163 


143 


164 




Likelihood = -2 


92 


Transmembrane 


122 


13 8 


120 


138 


INTEGRAL 


Likelihood = -0 


90 


Transmembrane 




56 




56 
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bacterial membrane --- Certainty=0 . 6243 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

5 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAD22040 GB:AF118229 transmembrane protein PstA [Streptococcus pneumoniae] 
Identities = 135/263 (51%) , Positives = 203/263 (76%) 















4- L +VY + L+F +4- ++ +IL+KGLPH++ LF+WTY ++N+SL+PA I+T+ 4-+ 




Sbjct: 


4 


YI1LKLLVYCFSAI1TFGSI1FLIIGFILIKGLPHLSLSLFSWTYTSENISLMPAIISTVILV 


63 


Query: 


83 


ALTLLFAVPLGIGGSiyiiTEYARRDNPYLKIIRVATETIAGIPSIIYGLFGALFFVKYTH 


142 






LL A+P+GI YL EY 4-+D+ +KI4-R4-A+4-TL+GIPSI++GLFG LFFV + 




Sbjct: 


64 


FGALLLALPIGIFAGFYLVEYTKKDSLCVKIMRLASDTLSGIPSIVFGLFGMLFFWFLG 


123 


Query: 


143 


LGLSLISGSLTLSIMILPLIMRTTEEALLSVPDSYREGAFALGAGKLRTIFKIVLPSAMS 


202 






SL+SG LT IM+LP+ 1 4-R+TEEALLSV DS R+ ++ LGAGKLRT+ F+ IVLP AM 




Sbjct: 


124 


FQYSLLSGILTSVIMVLPVIIRSTEEALLSVSDSMRQASYGLGAGKLRTVFRIVLPVAMP 


183 


Query: 


203 


GIFAGIIIAVGRIIGESAaLIFTAGTVAKVAHSVFSSSRTIiAVHMYAISGEGLYVDQTYA 


262 






GI AG+IIA+GRI+GE+AAL++T GT S+ SS R+LA+HMY +S EGL+V++ YA 




Sb j ct : 


184 


GIIAGVIIAIGRIVGETAALI^TLGTSTNTPSSLMSSGRSIALHMYMIiSSEGLHVNEAYA 


243 




263 


TAVILLLLVI IVNFVSGLVAKRL 285 








T VIL++ V+++N +S L++4-+L 




Sb j ct : 


244 


TGVILIITVLMINTLSSLLSRKL 266 





30 There is also homology to SEQ ID 1 686. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 528 

A DNA sequence (GBSx0566) was identified in S.agalactiae <SEQ ID 1687> which encodes the amino 
35 acid sequence <SEQ ID 1688>. Analysis of this protein sequence reveals the following: 

Possible site: 39 

>>> Seems to have no N-terminal signal sequence 

Final Results 

40 bacterial cytoplasm Certainty=0. 2687 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
45 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 529 

A DNA sequence (GBSx0567) was identified in S.agalactiae <SEQ ID 1689> which encodes the amino 
50 acid sequence <SEQ ID 1690>. This protein is predicted to be transmembrane protein PstC (pstC-2). 
Analysis of this protein sequence reveals the following: 

Possible site: 23 

>» Seems to have a cleavable N-term signal seq. 
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Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 



109 - 132 



- Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



- Certainty=0. 5267 (Affirmative) . 
-- Certainty=0. 0000 (Not Clear) c ; 
-- Certainty=0. 0000 (Not Clear) < : 



The protein has homology with the following 



in the GENPEPT database: 



Query: 


15 


Sbjct: 


1 


Query: 


73 


Sbj ct: 


61 




133 


Sbjct: 


121 




193 


Sbjct: 


180 




253 


Sbjct: 


240 



ITACVSVISAILICLFLFSSGLPAITKIGWGNFIFGKVWHPSN--NIFG1FPMIVGSLW 72 
++ A V+V++ +LIC F+FS+GLP I G+ F+ G W P+N +GI PMIVGSL + 
MSATVAWAILLICFFIFSNGLPFIANYGFARFLLGSDWSPTNIPASYGILPMIVGSLLI 60 



++G GM VL AS+LLGIMILPTI+S+E 



+ALGASHERS+F 



There is also homology to SEQ ID 1692. 

Based on this analysis, it was predicted that this protein and its epitopes, could be i 
40 vaccines or d 



Example 530 

A DNA sequence (GBSx0568) was identified in S.agalactiae <SEQ ID 1693> which encodes the amino 
acid sequence <SEQ ID 1694>. This protein is predicted to be probable hemolysin precursor (pstS). 
Analysis of this protein sequence reveals the following: 

Possible site: 34 

»> May be a lipoprotein 



- Final Results 

bacterial membrane — Certainty=0 . 0000 (Not Clear) > 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) . 



The protein has homology with the following sequences in the GENPEPT database: 

?GP:AAD22038 GB:AF118229 phosphate binding protein PstS 
[Streptococcus pneumoniae] 
Identities = 134/295 (45%) , Positives = 185/295 (62%) , Gaps = 9/295 (3%) 



Query: 1 MKKHKMLSLLAVSGLMGIGIIjAGCSNDSSSSSK— - 
MK KML+L A+ GL G G++A C N S++S + 



-GTINIVSREEGSGTRGAFIELFGI 57 
GTI +4-SRE GSGTRGAF E+ GI 
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58 


Sbjct: 


59 


Query 


118 


Sbjct: 


118 




177 




177 




237 


Sbjct: 


237 



-627- 

MKFKKMLTLAAI -GLSGFGLVA- CGNQSAASKQSASGT I EVI SRENGSGTRGAFTE I TGI 58 

CNKKGEKVDHTSDAATVTNSTSW1LTTVSKDPSAIGYSSLGSLNSSVKVLKIDGKNAT 117 
C+ +K+D+T+ A + NST +L+ V + +AIGY SLGSL SVK L+IDG A+ 
CDGD-KKIDNTAKTAVIQNSTEGVLSAVQGNftNAIGYISLGSLTKSVPCALEIDGVKAS 117 

3IKSGSYKISRPFNIVTKEGKEKEATKDFIDYILSKDGQAWEKNGYIPL-DKAKAYQ 176 
+ G Y + RPFNIV K +DFI tISKGQW N tl Y 

rVLDGEYPLQRPFNIVWSSNLSK-LGQDFISFIHSKQGQQWTDNKFIEAKTETTEYT 176 

/SSGKWIAGSSSVTPVMEKIKEAYHICVNAKUDVEIQQSDSSTGITSAIDGSADIGMA 23 6 

SGK+ + GS+SV+ +MEK+ KAY K N +V ++I + SS GIT+ + +ADIGM 
ILSGKLSWGSTSVSSLMEKIiAEAYKKENPEVTIDITSNGSSAGITAVKEKTADIGMV 236 

JLDKTESSKGVKATVIATDGIAVWNKKNKVNDLSTKQVKDIFTGKTTSWSDL 291 
Ih E K + IA DGIAVWN NK + +S ++ D+F+GK T+W + 



There is also homology to SEQ ID 1696. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

A related GBS gene <SEQ ID 8597> and protein <SEQ ID 8598> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: 23 Crend: 4 
McG: Discrim Score: 7.91 
GvH: Signal Score (-7.5): -3.72 

Possible site: 34 
>>> May be a lipoprotein 

ALOM program count: 0 value: 2.44 threshold: 0.0 
PERIPHERAL Likelihood = 2.44 248 
modified ALOM score: -0.99 

*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

SEQ ID 1694 (GBS24) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 14 (lane 9; MW 33kDa). 

GBS24-His was purified as shown in Figure 194, lane 10. 
Example 531 

A DNA sequence (GBSx0569) was identified in S.agalactiae <SEQ ID 1697> which encodes the amino 
acid sequence <SEQ ID 1698>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1725 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 532 

A DNA sequence (GBSx0570) was identified in S.agalactiae <SEQ ID 1699> which encodes the amino 
5 acid sequence <SEQ ID 170O. Analysis of this protein sequence reveals the following: 

Possible site: 58 

»> Seems to have no N-terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certainty=0. 2741 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

15 >GP:BAB0S069 GB:AP001511 unknown conserved protein [Bacillus halodurans] 

Identities = 119/250 (47%) , Positives = 149/250 (59%) , Gaps = 9/250 (3%) 

Query: 1 MQQYFWGE--AGAYVTIEDKDTIKHMFNVMRLTEDDQVVLVFDDAIKRLAKVVDSSAHR 58 
MQ+YFV E YVTI D +KH+ VMR+T D+ L+ D R + . A+ 

20 , Sbjct: 1 MQRYFVPKEQMTDTYVTITGDD- VKHIIKVMRMTIGDE- - LICSDGHGRTVRCEIEKAND 57 



Query: 59 FQIL EELDNNvEMPVQvTIASGFPKGDKLDFVTQKATELGAAAIWGFPADWSWKW 114 

++L EL N E+P++VTIA PKGDKLD++ QK TELGA A W F A S+VKW 
Sbjct: 58 SEVLARVIEPLIPNTELPIRVTIAQALPKGDKLDYIVQKGTELGAQAFWPFSASRSIVKW 117 

Query: 115 DGKKIJ^KEDKLAKIALGA^QSIOa^PQVRLFEKKADFQAELAGFDKIFIAYEESAKE 174 

D KK KK ++L KIA AAEQS R R+P + + E++GF K +AYEE AKE 

Sbjct: 118 DEKKGRKKTERLMKIAKEAAEQSYRERIPSIETPLAFSKLLQEISGFTKTIVAYEEEAKE 177 

Query: 175 GELSALAQNLQTVKAGDKLLFI FGPEGGI SPKE IAAFEEVGAI KVGLGPRIMRTETAPLY 234 

G L A L + GD Lti I GPEGG + +EI A + G GLGPR I + RTETA LY 
Sbjct: 178 GRLMTFAACLNELHHGDSLLVI1GPEGGFTTEEIDAIQRAGGAPAGLGPRILRTETASLY 237 

Query: 235 ALSVISYSAE 244 

AL+ ISY E 
Sbjct: 238 ALAAISYHFE 247 



A related DNA sequence was identified in S.pyogenes <SEQ ID 1701> which encodes the amino acid 

sequence <SEQ ID 1702>. Analysis of this protein sequence reveals the following: 

40 Possible site: 56 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2274 (Affirmative) < suco 

45 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Cert ainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 173/245 (70%), Positives = 202/245 (81%) 

50 

Query: 1 MQQYFTOGEAGAYVTIEDKDTIKHMFfcWmiiTE^ 60 

MQQYF+ G+A VTI DKDTIKHMF VMRL ++ +WLVFDD +K LAKV +S AH + 
Sbjct: 1 MQQYFIKGKAEKKOTITDKDTIKHMFQVMRLADEAEVVLVF 60 

55 Query: 61 ILEELDNNVEMPVQVTLASGFPKGDKLDFV^QKATELGAAAIWGFPADWSVWmGia^ 120 

I+E L + VE+PV+VTIASGFPKGDKLD + QK TELGA+A+WG+ PADWS WKWDGKKLA 
Sbjct: 61 IIEALPDQVELPVKVTIASGFPKGDKLDTIAQKVTELGASALWGYPADMSWKTOGiaaA 120 

Query: 121 [OCEDKLAKIALGAAEQSKRNRLPQWLFEKKADFQAELAGFDKIFIAYEESAKEGELSAL 180 
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KKEDKLAKI LGAAEQSKRNR+P+V LFE KA+F L+ FD IFIAYEE+AK G+L+ L 
Sbjct: 121 KKEDK^KIVLGAAEQSKRNRVPEVHLFEHICREFLKSLSSFDHIFIAYEETAKAGQIATL 180 

Query: 181 AQNLQTVKAGDKIiI,FlFGPEGGISPKEIAAFEEV5AIKVGLGPRlMRTETAPLYAI.SVIS 240 

A+ ++ VK G K+LFIFGPEGGISP EI FE AIKVGLGPRIMR ETAPLYALS +S 
Sbjct: 181 AREVKEVKPGAKILFIFGPEGGISPTEITQFFAASAIKVGLGPRIMRAETAPLYALSALS 240 

Query: 241 YSAEL 245 

Y+ EL 
Sbjct: 241 YALEL 245 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 533 

A DNA sequence (GBSx0571) was identified in S.agalactiae <SEQ ID 1703> which encodes the amino 
acid sequence <SEQ ID 1704>. Analysis of this protein sequence reveals the following: 
Possible site: 34 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.28 Transmembrane 238 - 254 ( 237 - 254) 

Final Results 

bacterial membrane Certainty=0 . 1914 (Affirmative) < succ> 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAA82791 GB:AB023064 orf35 [Listeria monocytogenes] 
Identities = 138/309 (44%) , Positives = 193/309 (61%) , Gaps = 5/309 (1%) 

Query: 4 VWELTVHvNREAEEAVSNLLIETGSQGVAISDSADYLGQ-EDRFGELYP EVEQSDMI 59 

W+E+ VH EA E V+N+L E G+ GV+I D AD+L + ED+FGE+Y E D + 
Sbjct: 3 WSEVEVHTTNEAVEPVANVLTEFGAAGVSIEDVADFLREREDKFGEIYALRREDYPEDGV 62 

Query: 60 AITAYYPDTLDIEAVKADLADRLANFEGFGIjATGSVNLDSQELVEEDWADNWKKYYEPAR 119 

I AY+ T + ++ L N F + G ++ +E+WA WKKYY P + 

Sbjct: 63 1 1 KAYFLKTTEFVEQ I PE IEQTLKNLST FD I PLGKFQFWND VDDEEWATAWKKYYHPVQ 122 

Query: 120 ITHDLTIVPSWTDYEAKAGEK1IKMDPGI4AFGTGTHPTTKMSLFALEQVLRGGETV1DVG 179 

IT +TIVPSW Y A E 11+ +DPGMAFGTGTHPTT++ + AL L+ G+ VIDVG 
Sbjct: 123 ITDRITIVPSWESYTPSANEIIIELDPGI'ffiFGTGTHPTTQLClRALSNYLQPGDEVIDVG 182 

Query: 180 TGSGVLSIASSLLGAKDIYAYDLDDVAWVAQENIDMNPGTENIHVAAGDLLKGVQQ-EV 238 

TGSGVLSIAS+ LGAK I A DLD++A R A+ENI +N IV +LL+ + + V 

Sbjct: 183 TGSGVLSIASAKLGAKSILATDI£)EIATRAAEENITLNKTEHIITVKQ1^LQDINK™V 242 

Query: 239 DVIVANILADILIHLTDDAYRLVKDEGYLIMSG3ISEKWDMVRESAEKAGFFLETHMVQG 298 

D++VANILA++++ +D Y+ +K G I SGII +K +V E+ + AG +E QG 
Sbjct: 243 DI vVANILAEVTLLFPEDVYKALKPGGVFIASGI ISDKAKWEEALKNAGLI IEKMEQQG 302 

Query: 299 EWNACVFKK 307 

+W A + K+ 
Sbjct: 303 DWVAIISKR 311 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1705> which encodes the amino acid 
sequence <SEQ ID 1706>. Analysis of this protein sequence reveals the following: 

Possible site: 34 
»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -4.57 Transmembrane 238 - 254 ( 237 - 257) 

Final Results 



WO 02/34771 



PCT/GB01/04789 



bacterial membrane Certainty=0 .2826 (Affirmative) • 

bacterial outside Certainty=0 . 0000 (Not Clear) < i 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < i 

5 The protein has homology with the following sequences in the databases: 



Query: 4 WQEVTVHVHRDAQEAVSHVLIETGSQGVAIADSADYIGQK-DRFGELYP DVEQSDMI 59 

W EV VH +A E V++VL E G4 GV+I D AD++ 44 D+FGE+Y 4 D + 
Sbjct: 3 WSEVKVHTTNEAVEPVANVLTEFGAAGVSIEDVADFLREREDKFGEIYALRREDYPEDGV 62 

Query: 60 AITAYYPSSTNLADIIATINEQLAELASFGLQVGQVTVDSQELAEEDVIADNWKKYYEPAR 119 

I AY4- +T 4-1 I + L L++F + +G4 ++ 4E+WA WKKYY P 4 

Sbjct: 63 I IKAYFLKTTEFVEQI PEIEQTLKNLSTFDI PI 1 GKFQF , 7VNDVDDEEWATAWKKYYHPVQ 122 

Query: 120 1THDLTIVPSWTDYDASAGEKVIKLDPGMAFGTGTHPTTKMSLFALEQILRGGETVIDVG 179 

IT +TIVPSW Y SA E +I+LDPGMAFGTGTHPTT++ + AL L+ G+ VIDVG 
Sbjct: 123 ITDRITIVPSWESYrPSANEIIIELDPGMAFGTGTHPTTQLCIRALSNYLQPGDEVIDVG 182 

Query: 180 TGSGVLSIASSLLGAKTIYAYDLDDVAVRVAQDKIDLNQGTDNIHVAAGDLLKGVSQ-EA 238 

TGSGVLSIAS+ LGAK+I A DLD++A R A++NI IM+ IV +LL+ +++ 
Sbjct: 183 TGSGVLSIASAKLGAKSIIATDLDEIATRAAEENITLNKTEHIITVKQNNLLQDINKTNV 242 

Query: 239 DVIVANILADILVLLTDDAYRLVKKEGYLILSGIISEKLDMVLEAAFSAGFFLETHMVQG 298 

D++VANILA++++L +D Y+ +K G I SGII +K +V EA +AG 4E QG 
Sbjct: 243 DIWANILAEVILLFPEDVYKALKPGGVFIASGIIEDKAKWEEALKNAGLIIEKMEQQG 302 

Query: 299 EWNALVFKK 307 

+W A++ K+ 
Sbjct: 303 DWVAIISKR 311 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 259/317 (81%) , Positives = 287/317 (89%) 

Query: 1 MNTWNELTvHVNREAEEAVSNLLIETGSQGVAISDSADYLGQEDRPGELYPEVEQSDMIA 60 

M TW E+TVHV+R+A+EAVS++LIETGSQGVAI+DSADY+GQ+DRFGELYP+VEQSDMIA 
Sbjct: 1 METWQEWVHVHRDAQEAVSHVLIETGSQGVAIADSADYIGQKDRFGELYPDVEQSDMIA 60 

Query: 61 ITAYYPDTLDIEAVKADLADRLANFEGFGLATGSVNLDSQELVBEDWADNWKKYYEPARI 120 

ITAYYP + ++ + A + ++LA FGL G V +DSQEL EEDWADN11KKYYEPARI 
Sbjct: 61 ITAYYPSSTNIADIIATIIffiQLAEIASFGLQVGQVTvDSQEIiAEEDWADNl^KKYYEPARI 120 

Query: 121 THDLTIVPSWTDYEAKAGEKIIKMDPGMAFGTGTHPTTKMSLFALEQVLRGGETVIDVGT 180 

THDLTIVPSWTDY+A AGEK+IK4DPGMAFGTGTHPTTKMSLFALEQ+LRGGETVIDVGT 
Sbjct: 121 THDLTIVPSWTDYDASAGEKVIKLDPGMAFGTGTHPT7KMSLFALEQILRGGETVIDVGT 180 

Query: 181 GSGVLSIASSLLGAKDIYAYDLDDVAVRVAQENIDMNPGTENIHVAAGDLLKGVQQEVDV 240 

GSGVLSIASSLLGAK IYAYDLDDVAVRVAQ4NID4N GT4NIHVAAGDLLKGV QE DV 
Sbjct: 181 GSGVLSIASSLLGAKTIYAYDLDDVAVRVAQDNIDLNQGTDNIHVARGDLLKGVSQEADV 240 

Query: 241 IVANIIlMII.IHL^DDAYRLvlCDEGYLIMSGrISEKWDMVRESAEt<AGFFI l ETH^W■QGEW 300 

IVANILADIL4 LTDDAYRLVK EGYLI 4SGI I SEK DMV E4A AGFFLETHMVQGEW 
Sbjct: 241 IVANILADILVLLTDDAYRLVKKEGYLILSGIISEKLDMVLSAAFSAGFFLETHMVQGEW 300 

Query: 301 NACVFKKTDDISGVIGG 317 

NA VFKKTDDISGVIGG 
Sbjct: 301 NALVFKKTDDISGVIGG 317 



60 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



WO 02/34771 



-631- 



PCT/GB01/04789 



Example 534 

A DNA sequence (GBSx0572) was identified in S.agalactiae <SEQ ID 1707> which encodes the amino 
acid sequence <SEQ ID 1708>. Analysis of this protein sequence reveals the following: 

Possible site: 61 
5 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4198 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
15 vaccines or diagnostics. 

Example 535 

A DNA sequence (GBSx0573) was identified in S.agalactiae <SEQ ID 1709> which encodes the amino 
acid sequence <SEQ ID 1710>. This protein is predicted to be transcriptional activator tipa. Analysis of this 
protein sequence reveals the following: 

20 Possible site: 33 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 . 0683 (Affirmative) < suco 
25 bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside -— Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB15677 GB:Z99122 transcriptional regulator [Bacillus subtilis] 
30 Identities = 87/246 (35%) , Positives = 139/246 (56%), Gaps = 13/246 (5%) 





4 


VKEVSILSGVSVRTLHHYDKIGLFPPTALSEAGYRLYDDEALIRLQEILLFRELEFPLKD 


63 






VK+V+ +SGVS+RTLHHYD I L P+AL++AGYRLY D L RLQ+IL F+E+ F L + 




Sbjct: 


5 


VKQVAEISGVSIRTLHHYDNIELLNPSALTDAGYRLYSDADLERLQQILFFKEIGFRLDE 


64 




64 


IKYBLEQAKEERQD1 1 LAQQ1KLLEWKRSHLEQVITHAKR--LQEKGDDYMN FDVYN 


117 






IK +L+ +R+ L Q ++L K+ ++4+I R L G 4 MN F + 




Sb j Ct : 


65 


IKEMLDHPNFDRKAALQSQKEILMKKKQRMDEMIQTIDRTLLSVDGGETMNKRDLFAGLS 




Query: 


118 


KTELEQLQA EAKEKWGQTAA--YKEFAQKHASDDFAQISQEMAKIMVQFGQLKTQN 


171 






++E+ Q E ++ +G+ A ++ +++DD+ IE I + 




Sbjct: 


125 


MKDIEEHQQTYADEVRKLYGKEIAEETEKRTSAYSADDTOTIMAEFDSIYRRIAARMKHG 


184 


Query: 


172 


VSDESVQMCVKRLQDYISQNFYTCTNEILAGLGQMYCSDDRFSQSIDKAGGAGTSEFVSQ 


231 






D +Q V +D+I Q Y CT +1 GLG++Y +D+RF+ SI++ G G + F+ + 




Sbjct: 


185 


PDDAEIQAA.VGAFRDHICQYHYDCTLDIFRGLGEVYITDERFTDSINQY-GEGBAAFLRE 


243 




232 


AIAYYC 237 








AI YC 




Sbjct: 


244 


AIIIYC 249 





A related DNA sequence was identified in S.pyogenes <SEQ ID 171 1> which encodes the amino acid 
sequence <SEQ ID 1712>. Analysis of this protein sequence reveals the following: 

Possible site: 48 
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> Seems to have no N-terminal signal sequence 

Likelihood = -8.28 Transmembrane 
Likelihood = -2 . 92 Transmembrane 



- Final Results -■ 

bacterial it 
bacterial outside - 
bacterial cytoplasm - 



- Certainty=0. 4312 (Affirmative) 

- Certainty=0. 0000 (Not Clear) . 

- Certainty=0. 0000 (Not Clear) • 



10 The protein has homology with the following sequences in the databases: 

>GP:CAB15677 GB:Z99122 transcriptional regulator [Bacillus subtilis] 
Identities = 40/107 (37%) , Positives = 69/107 (64%) , Gaps = 6/107 (5%) 

Query: 7 YSTGELANLAGVSIRTVQYYDQRGILIPTALTAGGRRLYTDSDLEQLRMICFLRDLGFSI 66 
15 Y ++A ++GVSIRT+ +YD +L P+ALT G RLY+D+DLE+L+ I F +++GF + 

Sbjct: 3 YQWQVAEISGVSlRTLHHYDNIELIiNPSALTDAGYRLYSDADLERLQQILFFKEIGFRL 62 

Query: 67 EQIRKVLAEENAAQVLELLLVDHIATAKEDLAAKEQQVDIAVKILDR 113 
++I+++L N + L + KE L K+Q++D ++ +DR 

20 Sbjct: 63 DE I KEMLDHPNFDRKAAL QSQKEILMKKKQRMDEMIQTIDR 103 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 40/133 (30%) , Positives = 71/133 (53%) , Gaps = 6/133 (4%) 

25 Query: 6 EVSILSGVSVRTLHHYDKIGLFPPTALSEAGYRLYDDEALIRLQEILLFRELEFPLKDIK 65 

E++ L+GVS+RT+ +YD+ G+ PTAL+ G RLY D h +L+ I R+L F ++ 1+ 
Sbjct: 11 ELANLAGVSIRTVQYYDQRGILIPTALTAGGRRLYTDSDLEQLRMICFLRDLGFSIEQIR 70 

Query: 66 YLL- -EQAKEERQDLLAQQIKL LEWKRSHLEQVITHAKRLQEKGDDYMNFDVYNKT 119 

30 +L E A + + LL I L K ++ + RL+++ ++F + 

Sbjct: 71 KWIAEEKAAQVIjELLLvDHIATAKEDLAAKEQQVDIAVKILDRLRKQDPQSLDFLMDISL 130 

Query: 120 ELEQLQAEAKEKW 132 
++ +A K +W 
35 Sbjct: 131 SMKNQKAWKKLQW 143 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 536 

40 A DNA sequence (GBSx0575) was identified in S.agalactiae <SEQ ID 1713> which encodes the amino 
acid sequence <SEQ ID 1714>. Analysis of this protein sequence reveals the following: 

Possible site: 24 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.06 Transmembrane 57 - 73 ( 57 - 73) 

45 

Final Results 

bacterial membrane --- Certainty=0 . 1022 (Affirmative) < suco 
bacterial outside — - Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

50 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CftB14586 GB:Z99117 yrkN [Bacillus subtilis] 
Identities = 38/136 (27%) , Positives = 60/136 (43%) , Gaps = 3/136 (2%) 

55 Query: 2 ITLQKAEASDLEKI IA- IQRASFKAVYEKYHDQYDPYVEEVEQIRWKLVERPDCFYHFVL 60 

+ L+ A+ SDL + +Q A AV E + D D + ++ + P + +L 

Sbjct: 9 VILELAKESDLPEFQKKLQEAFAIAVIETFGDCEDGPI PSDNDVQ - ESFNAPGAWYHI L 67 



Query: 61 VDETIVGFLRLVIKDEEKRAWLGTAAILPQYQGCGYGSA7AMALLEKTYPKLTKWDLCTIA 120 
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Query: 121 QEKLMVSFY-EKCGYH 135 

EK ++FY KCG+H 
Sbjct: 128 FEKRNINFYVNKCGFH 143 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 537 

A DNA sequence (GBSx0576) was identified in S.agalactiae <SEQ ID 1715> which encodes the amino 
acid sequence <SEQ ID 1716>. This protein is predicted to be Bacterial mutT protein. Analysis of this 
protein sequence reveals the following: 
Possible site: 13 

>>> Seems to have no N- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .2417 (Affirmative) < succ 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 



Query: 10 FSGAK1ALFCEGKILTSLRDDFPDLPYAGPVJDLPGGGREDNETPLECLFREVDEELSLTL 69 

FSGAK+ALF ++ RD+ P 4P+ G+WD PGGGRE ETP EC RE++EE S+ L 
Sbjct: 7 FSGAKLALFYGDHLWYI<KDEKPGIPFPGYWDFPGGGREGLETPAECALRELEEEFSIRL 66 

Query: 70 TRNHIDWVKTYRGMIJKPDKLSVF^IVGHISQKEYDSIVLGDEGQDYKLMSIDEFLSHICKVI 129 

I+W + Y + F+V + +E+4 + I GDEGQ ++LM +D +L+H + 

Sbjct: 67 EEPRIEWQRQYPSTSGSAPFAYFLVARLEDREFEAIRFGDEGQYWRLMEVDAYLAHAMAV 126 

Query: 130 PQLQERLRDYL 140 

P LQ RL DYL 
Sbjct: 127 PYLQSRLGDYL 137 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 538 

A DNA sequence (GBSx0577) was identified in S.agalactiae <SEQ ID 1717> which encodes the amino 
acid sequence <SEQ ID 171 8>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 3299 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 1719> which encodes the amino acid 
sequence <SEQ ID 1720>. Analysis of this protein sequence reveals the following: 

Possible site: 41 

>» Seems to have no N-terminal signal sequence 

5 

Final Results 

bacterial cytoplasm Certainty=0. 5527 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

10 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 111/156 (71%) , Positives = 128/156 (81%) 

Query: 1 ^KFGFLSVIjEEELDKHLQYDFANIDWDKKNHTVEVTFILEAQNSSAIETVDDQGETSSED 60 
15 MA +GFLSVLEEE+DKH QYD+AMDWDKKNH VEVTF+LEAQN AI+T+DD GE + +D 

Sbjct: 1 ^TYGFLSVLEEEMDKHFQYDYAMDTOKKKHAVEVTFVLEAQNKEAIKTIDDSGEVTQDD 60 

Query: 61 IVFEDYVLFYNPVKSRFDAEDYLVTIPYEPKKGLSREFLAYFAETliNEVATEGLSDLMDF 120 
IVFEDYVLFYNP KS+FDA DYLVTIP++ KKG SREFIAYFA+ LN+VA EG SDLMDF 
20 Sbjct: 61 IVFEDYVLFYNPAKSQFDAADYLVTIPFDAKKGFSREFLAYFAQFLNDVAIEGHSDLMDF 120 

Query: 121 LTDDSIEEFGLSWDTDAFENGRAELKETEFYPYPRY 156 
' L DDS +F L W+ AFE G+ L+E YPYPRY 

Sbjct: 121 LADDSKADFFLEWNAQAFEEGQQGLEEAASYPYPRY 156 

25 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 539 

A DNA sequence (GBSx0578) was identified in S.agalactiae <SEQ ID 1721> which encodes the amino 
30 acid sequence <SEQ ID 1722>. Analysis of this protein sequence reveals the following: 

Possible site: 26 

>» Seems to have no N-terminal signal sequence 

Final Results 

35 bacterial cytoplasm --- Certainty=0 . 2846 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

40 >GP:CAB51273 GB:AL096872 putative acetyltransf erase [Streptomyces 

coelicolor A3 (2) ] 

Identities = 35/109 (32%) , Positives = 62/109 (56%) , Gaps = 1/109 (0%) 

Query: 51 VAEVDDKIAGVLDFGPYYPFPAGKHVATF-GILIAEPYQGQGLGKALLKALLTEAKAQGY 109 
45 VAE+D + G + G P + HV G+ +A +G G+G+AL++A + EA+ +G+ 

Sbjct: 56 VAELDGAWGYTOLGFPTPIASNTHWQ1RGLAVAGAARGHGVGRALVRAAVEEARHEGF 115 

Query: 110 IKIAMEWMGNNSRAISLYQKYGFTEEARITKAFFIE^fflYVDALIFAKDL 158 
+1 + V+G+N+ A LY+ GF E + F ++ YVD ++ + h 
50 Sbjct: 116 RRITLRVLGHNTAARGLYESEGFVVEGVQPEEFHLDGRYVDDVLMGQML 154 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1723> which encodes the amino acid 
sequence <SEQ ID 1724>. Analysis of this protein sequence reveals the following: 

Possible site: 18 
55 >» Seems to have no N-terminal signal sequence 



Final Results 
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bacterial cytoplasm Certainty=0 . 0229 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

5 An alignment of the GAS and GBS proteins is shown below: 

Identities = 34/108 (31%), Positives = 59/108 (54%), Gaps = 7/108 (6%) 

Query: 35 TESDLEKNLANGMSFFV AEVDDKIAGVLDFGPYYPFPAGKHVATFGIL1AEPYQG 89 

T +L L+ + F4 A +D+K+ G+L+ G+ A +L+A+ Y+G 

10 Sbjct: 43 TPQELSDFLSRSQTSFIDFCLLARLDEICWGLLNLSGEV-LSQGQAEADVFMLVAKTYRG 101 

Query: 90 QGLGKAIiLKALLTElAKAQGYIK-IAMHVMGiOTSRAISLYQKYGFTEEA 136 

G+G+ LL4- L A+ YI+ 4- + V N++AI LY+KYGF E+ 
Sbjct: 102 YGIGQLLLEIALDWAEENPYIESLKLDVQVRNTKAIYLYKKYGFRIES 149 

15 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 540 

A DNA sequence (GBSx0579) was identified in S.agalactiae <SEQ ID 1725> which encodes the amino 
20 acid sequence <SEQ ID 1726>. Analysis of this protein sequence reveals the following: 

Possible site: 45 

>>> Seems to have no N- terminal signal sequence 

Final Results 

25 bacterial cytoplasm — Certainty=0 .2056 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB14712 GB:Z99118 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 248/417 (59%), Positives = 314/417 (74%), Gaps = 4/417 (0%) 

Query: 5 IiALRMRPRNINEVIGQQHLVGNGKIIDRMVAANMLSSMILYGPPGlGKTSIASAIAGTTK 64 

LA RMRP I ++IGQQHLV KII RMV A LSSMILYGPPGIGKTSIA+AIAG+T 
Sbjct: 4 LAYRMRPTKIEDIIGQQHLVAEDKIIGRMVQAKHJjSSMILYGPPGIGKTSIATAIAGSTS 63 

Query: 65 YAFRTFNATVDSKKRLQEIAEEAKFSGGBVLLLDEIHRLDKTKQDFLLPLLENGNIIMIG 124 

AFR NA +++KK ++ +A+EAK SG ++L+LDE+HRLDK KQDFLLP LENG II+IG 
Sbjct: 64 IAFRKLNAVINNKKDMEIVAQEAKMSGQVILILDEVHRIiDKGKQDFLLPYLENGMIILIG 123 

Query: 125 ATTENPFFSVTPAIRSRVQIFELEPLSNEDIKKAIQLAISDKERGF-PFLVTIDDEALDF 183 
ATT NP+ ++ PAIRSR QIFELEPL+ E IK+A++ A+ D+ RG + V+IDD+A++ 

Query: 184 IVTATNGDLRSAYNSLDLAVMSTSPNEDGSRHISLETMENSLQCSYITMDKNGDGHYDIL 243 

GD+RSA N+L+LAV+ST +• DG HI+LET E LQ ■(- DK+GD HYD+L 
Sbjct: 184 FAHGCGGDTOSALNALELAVLSTKESADGEIHITLETAEECLQKKSFSHDKDGDAHYDVL 243 

Query: 244 SALQKS IRGSDWASLHYAARLVEAGDLPSLARRLTI IAYEDIGLANPEAQIHTVTALEA 3 03 

SA QKSIRGSD NA+LHY ARL+EAGDL S+ARRL +IAYEDIGLA+P+A + A++ 
Sbjct: 244 SAFQKSIRGSDANAALHYJ^LIEAGDLESIARRLLVIAYEDIGIASPQAGPRVLNAIQT 303 

Query: 304 AQRIGFPEARILIANIVVDIALSPKSNSAYliAMDAALADLRRSGNLPIPRHLRDGHYSGS 363 

A+R+GFPEARI +AN V++L LSPKSNSA LA+D ALAD+R +P+HL+D HY G+ 

Sbjct: 304 AERVGFPFARIPLANAVIELCLSPKSN8AIIAIDEM.ADIRAGKIGDVPKHIJKDAHYKGA 363 

Query: 364 KTLG^IARDYKyPHAYPEKWWQQYLPDKLVGHNYFEANETGKYERALGSNKERIDKL 420 

+ LG DYKYPH Y WV+QQYLPD L Y++ +TGK+E AL K+ D2CL 
Sbjct: 364 QELGRGIDYKYPHNYDNGWVEQQYLPDPLKNKQYYKPKQTGKFESAL KQVYDKL 417 
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A related DNA sequence was identified in S. pyogenes <SEQ ID 1727> which encodes the amino acid 
sequence <SEQ ID 1728>. Analysis of this protein sequence reveals the following: 

o N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2374 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 394/422 (93%) , Positives = 409/422 (96%) 

Query: 1 MADNLALRMRPRNINEVIGQQHLVGNGKIIDRMVAANMLSSMILYGPPGIGKTSIASAIA 60 

M D+LALRMRP+ I+EVIGQ+HLVG GKII RMV AN LSSMILYGPPGIGKTSIASAIA 
Sbjct: 1 MPDHLALRMRPKTISEVIGQKHLVGEGKIlRRNIV3ANRLSSMILyGPPGIGKTSIASAIA 60 

Query: 61 GTTKYAFRTFNATVDSKKRLQEIAEEAKFSGGLVLLLDEIHRLDKTKQDFLLPLLENGNI 120 

GTT+YAFRTFNAT+DSKKRLQEIAEEAKFSGGLVLLLDEIHRLDKTKQDFLLPLLENG I 
Sbjct: 61 GTTRYAFRTFTJATIDSKKRLQEIAEEAKFSGGLVLLLDEIHRLDKTKQDFLLPLLENGTI 120 

Query: 121 IMIGATTENPFFSVTPAIRSRVQIFELEPLSNEDIKKAIQLAISDKERGFPFLVTIDDEA 180 

IMIGATTENPFFSVTPAIRSRVQIFELEPLSNEDIK AIQLAISDKERGFPFLVTIDDEA 
Sbjct: 121 IMIGATTENPFFS VTPAIRSRVQI FEliEPLSNEDIKTAIQLAI SDKERGFPFLVTIDDEA 180 



Query: 241 DILSALQKSIRGSDVNASLHYAARIjVEAGDLPSIjARRLTIIAYEDIGLANPEAQIHTVTA 300 

D+LSALQKQIRGSDVNASLIIYA^LVEAGDLPSLaRRLTIIAYEDIGIJ^+AQ+HTVTA 
Sbjct: 241 DVIjSALQKSIRGSDvTSIASLHYAARLVEAGDLPSLARRLTIIAYEDIGIANPDAC^/HTVTA 300 

Query: 301 LEAAQRIGFPEARILIANIVTOI^SPKSNSAYLAMDAAIjADLRRSGNLPIPRHLRDGHY 360 

L+AAQRIGFPEARI ian+v+dlalspksnsaylamdaaladlr sgnlpiprhlrdghy 

Sbjct: 301 LDAAQRIGFPEARIPIANWIDI^SPKSNSAYIAMDAAKADLRTSGNLPIPRHLRDGHY 360 

Query: 361 SGSKTLGNARDYKYPHAYPEKMVKQQYLPDKLVGHNYFEANETGKYERALGSNKERIDKL 420 

+GSK LGNA+DY YPHAYPEKWVKQQYLPDKLVGH+YFEANETGKYERALGSNKERIDKL 
Sbjct: 361 AGSKDLGNAKDYLYPHAYPEKWVKQQYLPDKLVGHHYFEANETGKYERALGSNKERIDKL 420 

Query: 421 SD 422 
SD 

Sbjct: 421 SD 422 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 541 

A DNA sequence (GBSx0580) was identified in S.agalactiae <SEQ ID 1729> which encodes the amino 
acid sequence <SEQ ID 1730>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

■ Final Results 

bacterial cytoplasm Certainty=0. 2991 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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A related GBS nucleic acid sequence <SEQ ID 10207> which encodes amino acid sequence <SEQ ID 
10208> was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 542 

A DNA sequence (GBSx0581) was identified in S.agalactiae <SEQ ID 1731> which encodes the amino 
acid sequence <SEQ ID 1732>. Analysis of this protein sequence reveals the following: 
Possible site: 29 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm — Certainty=0 . 2402 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 543 

A DNA sequence (GBSx0582) was identified in S.agalactiae <SEQ ID 1733> which encodes the amino 
acid sequence <SEQ ID 1734>. Analysis of this protein sequence reveals the following: 



i an uncleavable N-term signal seq 

INTEGRAL Likelihood =-10.40 Transmembrane 231 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 



92 Transmembrane 159 - 175 

08 Transmembrane 21 - 37 

08 Transmembrane 181 - 197 

35 Transmembrane 111 - 127 

81 Transmembrane 74 - 90 



Final Results 

35 bacterial membrane --- Certainty=0 . 5161 (Affirmative) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

40 ?GP:CAB15891 GB:Z99123 yxlG [Bacillus subtilis] 

Identities = 54/203 (26%) , Positives = 100/203 (48%) , Gaps = 7/203 (3%) 

Query: 1 MTGLIPMLKKEWLENSRSHKAI>ALLLISIIFGILGPLTALLMPEIMA--GILPKKLQEAI 58 
M ++ +L+KEWLE +S K + L + +1 G+ PLT MPEI+A G LP ++ + 
45 Sbjct: 1 MKVMMALLQKEWLEGWKSGKLIWLPIAMMIVGLTQPLTIYYMPEIIAHGGNLPDGMKISF 60 

Query: 59 PDPTYLDSYSQYFKNINQLGLILLVFLFSGSLTQEFTRGTLINLITKGLSKKAIILAKFI 118 

P+ + N LG+ L++F GS+ E +G ++++ ++ I++K++ 

Sbjct: 61 TMPSGSEVMVSTLSQFNTLGMALVIFSVt'IGSVA^RKQGV^ALIMSRPVTAAHYIVSKWL 120 
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Query: 119 MMTLIWSISYILGSLTQYAYTLYYFMffiGQHKLIV-YGTSWIFGLLLLSLILFYSVIFRK 177 

+ ++I +S+ G Y Y F + + G ++ + +++ I) S IFR ■ 

Sbjct: 121 IQSVIGIMSFAAGYGLfiYYYVRLLFEDASFSRFiyiSLGLYALWVIFIVTAGLRGSTIFR- 179 

5 Query: 178 TAGVLIAC LMTIVAFFISGF 197 

+ G AC L V+F 4 F 
Sbjct: 180 SVGAAAACGIGLTAAVSFAVHYF 202 

No corresponding DNA sequence was identified in S.pyogems. 

10 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 544 

A DNA sequence (GBSx0583) was identified in S.agalactiae <SEQ ID 1735> which encodes the amino 
acid sequence <SEQ ID 1736>. This protein is predicted to be ABC transporter, ATP-binding protein. 
15 Analysis of this protein sequence reveals the following: 

Possible site: 61 

»> Seems to have no N-terminal signal sequence 

Final Results 

20 bacterial cytoplasm Certainty=0 . 1344 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

25 >GP:CAB15892 GB:Z99123 similar to ABC transporter (ATP-binding 

protein) [Bacillus subtilis] 
Identities = 116/303 (38%), Positives = 175/303 (57%), Gaps = 18/303 (5%) 









ISLQNLSKSFGDQIILNQVSLELEENKIYGFVGPNGAGKTTTIKMILGLLKVDSGTISVM 


63 


30 






+S+++L KS+ 4 VS + EN+ +GPNGAGKTTT44M4 GLL SGTI +4 






Sbjct: 


2 


LSIESLCKSYRHHEAVKNVSFHVNENECVALLGPNGAGKTTTLQMLAGLLSPTSGTIICLL 


61 






64 


GNPVTFGQTKSNQVIGYLPDVPEFYDYMTAQEYLQLC- - -AGLAQNKTSLPIADLLEQVG 


120 








G 4 44IGYLP P FY 4MTA E4L 4GL4+ K I 44LE VG 




35 


Sbjct: 


62 


GE KKLDRRLIGYLPQYPAFYSWMTANEFLTFAGRLSGLSICRKCQEKIGEMLEFVG 


116 






121 


LADN-QQRISTYSRGMKQRLGLAQALIHNPKILICDEPTSALDPQGRQEILSIISQLRGQ 


179 








L 4 4RI YS GMKQRLGIiAQAL+H PK LI DEP SALDP GR E4L 44 4L4 






Sbjct: 


117 


LHEAAHKR1GGYSGGMKQRLGLAQALLHKPKFLILDEPVSALDPTGRFEVLDMMRELKKH 


176 


40 










Query: 


180 


KTVI FSTHILSDVEKVCDQVLI LTKSGIH- - -NLEDLRDKASASVNQKNLLIKVSDNEAQ 


236 








V4FSTH4L D E4VCDQV4I4 I L44L4 4 4V L44 K4 4 






Sbjct: 


177 


t^VLFSTHVLHDAEQVCDQWIMKNGSISWKGSLQELKQCQQTNVFTLSVKEKLEGWLEE 


236 


45 




237 


KlALRFPraQKDQYYKVHLELSEANNREQALASFYRYLVEQEITPYFIELLEDSLEDFYL 


296 








K+ 44 4 EL 44 L4 44 4T E 4SLED YL 






Sbjct: 


237 


KPYVSAIVYKWPS - -QAVFELPDIHAGRSLLSD C1RKGLTVTRFEQKTESLEDVYL 


290 






297 


EVI 299 




50 






4V4 






Sbjct: 


291 


KW 293 





There is also homology to SEQ ID 686. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
55 vaccines or diagnostics. 
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Example 545 

A DNA sequence (GBSx0584) was identified in S.agalactiae <SEQ ID 1737> which encodes the amino 
acid sequence <SEQ ID 1738>. Analysis of this protein sequence reveals the following: 

Possible site: 32 
5 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 4383 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB71491 GB:U53767 ORF6 [Bacillus pumilus] 
Identities = 25/60 (41%) , Positives = 41/60 (67%) 

15 

Query: 2 IGDTILFERTRLGMTQEKLSDYLHLTKATISKJIENNQAKPDIDYLILMAKLFDMTLDELV 61 

+G I +R L ++QE +++ h +++ ISKWE NQ++P +D LI +A+LFD + ELV 
Sbjct: 4 LGSNISNKRKSLKLSQEYVAEQLGVSRQAISKWETNQSEPSMDNIiIRLAELFDSDIKELV 63 

20 There is also homology to SEQ ID 1 740. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 546 

A DNA sequence (GBSx0585) was identified in S.agalactiae <SEQ ID 1741> which encodes the amino 
25 acid sequence <SEQ ID 1742>. Analysis of this protein sequence reveals the following: 
Possible site: 41 

>>> Seems to have no N-terminal signal sequence 

Final Results 

30 bacterial cytoplasm --- Certainty=0 .4241 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 
bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

35 ?GP:CAB15470 GB:Z99121 yvdC [Bacillus subtilis] 

Identities = 59/104 (56%) , Positives = 76/104 (72%) 

Query: 1 MDITAYQOWSEFYKKRTJWYQYNSFIRSNFLCEEVGELAQAIRKYEIGRDRPDEIEKSNN 60 
M + +KW+ EFY+KR W +Y FIR FL EE GELA+A+R YEIGRDRPDE E S 
40 Sbjct: 1 MQLADAEKWMKEFYEKRGVWEYGPFIRVGFLMEEAGELARAVRAYEIGRDRPDEKESSRA 60 

Query: 61 ENLNDIKEELGDVLDNIFILADQYNISLEEIIEAHKNKLEKRFE 104 

E ++ EE+GDV+ NI ILAD Y +SLE++++AH+ KL KRFE 
Sbjct: 61 EQKQELIEEMGDVIGNIAILADMYGVSLEDVMKAHQEKLTKRFE 104 

45 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 547 

50 A DNA sequence (GBSx0586) was identified in S.agalactiae <SEQ ID 1743> which encodes the amino 
acid sequence <SEQ ID 1744>. Analysis of this protein sequence reveals the following: 
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N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 0453 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB06803 GB:AP001517 unknown conserved protein [Bacillus halodurans] 
Identities = 87/187 (46%) , Positives = 125/187 (66%) 

Query: 1 MKITVFCGASNGNNPiySQKIvllLGEWMIKNNHDLvYGGGIOTGLMGVIADTVINNGGQAI 6 0 

MKI VFCG+SNG + +Y + +LG+ + + LVYGG VG+MG +AD+V+ GG+ I 
Sbjct: 1 MKIAVFCGSSNGASDWKEGARQI^KEIAHRGITLVyGGASVGIMGAVADSVLEAGGEVI 60 

Query: 61 GVIPTFLKDREIAHTNLSKLIVVENMPQRKGKMMSLGEAYIALPGGPGTLEEISEVISWS 12 0 

GV+P FL++ EI+H +L+KLIWE M +RK KM L + ++ALPGGPGTLEE E+ +W+ 
Sbjct: 61 GVMPRFLEEPEISHPHLTKLIVVETMHERKAKMAELMJGFLALPGGPGTLEEFFEIFTWA 120 

Query: 121 RIGQNDSPCILYNINGYFNHLESMFDHMVSEGFLSQNDRNNVLFSDDIIEIEKFIKDYQS 180 

+IG + PC L NIN YF+ L ++ HM +E FL + R+ L D I + Y+ 
Sbjct: 121 QIGLHQKPCGLIMINHyFDPLVTLLHHMSNEQFLHEKYRSMALVHTDPILLLDQFSTYEP 180 

Query: 181 PTIRKYS 187 

PT++ YS 
Sbjct: 181 PTVKAYS 187 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 548 

A DNA sequence (GBSx0587) was identified in S.agalactiae <SEQ ID 1745> which encodes the amino 
acid sequence <SEQ ID 1746>. Analysis of this protein sequence reveals the following: 



Final Results 

bacterial cytoplasm Certainty=0. 5288 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 549 

A DNA sequence (GBSx0588) was identified in S.agalactiae <SEQ ID 1747> which encodes the amino 
acid sequence <SEQ ID 1748>. This protein is predicted to be integrase. Analysis of this protein sequence 
reveals the following: 

no N- terminal signal sequence 
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bacterial cytoplasm Certainty=0 . 3685 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF12706 GB:AF066865 integrase [bacteriophage TPW22] 
Identities = 106/377 (28%) , Positives = 199/377 (52%) , Gaps = 31/377 (8%) 





4 


ARYRRRGNQ 


NLWAYEIREEGKTVAYNS GFKTKKLAEAEAEPILQKLRTGSI ITKNI 


59 






A +R+RG 


W + + + Y G+KTKK AEA A+ ++L S +1 




Sb j ct : 


2 


ANFRKRGKT 


- -WQFRLSYKDNNGEYKKFEKGGYKTKKEAEAAADEAKKRLNNHSEFDNDI 


59 


Query: 


60 


SLPELYQEW 


LDLKIMPSNRSDVTKKKYLSRKVTLEKLFGDKPISQIRPSEYQRIMNNYGQ 


119 






SL + +++W 


+ P 4- ++ T + Y ++K DKPI++I P+ YQ ++N 




Sbjct: 


60 


SLYDFFEKW 


AKVYKKP-HVTEATWRTYKRTLNLIDKYIKDKPIAEITPTFYQAVLNKMSL 


118 



Query: 


120 


RVSRNFLGRLNTGVKQSLQMAIADKVT'IIEDFTQNVELFSTVKSQDADSKYLHSEKAYLDL 


179 






+ h + +K ++++A+ +KV+ E+F + S + ++ + KYLH++4- YL L 




Sbjct: 


119 


LYRQESLDKFYFQIKSAMKIAVHEKVISENFADFTKAKSKLAARPVEEKYLHADE-YLKL 


177 


Query: 


180 


INAVKDKFNYKKSWPYIIYFLLKTGMRYGELIALTWEDIDFDKGIFKTYRRFN-SETSQ 


238 






+ ++K Y + Y TGMR+ EL+ LTW +DFDK R ++ S T+ 




Sb j ct : 


178 


LAIAEEKMEYTSY FACYLTAVTGMRFAELLGLTWSHVDFDKKEISIQRTWDYSITNN 


234 




23 9 


FVPPKNKTSIRIVPVDNECLEILKNLKIEQNQSNKELGLQNTNNMVFQHFGYPNSVPSTN 


298 






F KN++S R +P+ ++ +4-+LK K KE+N+V+ SN 




Sb j ct : 


235 


FAETKNESSKRKIPISSKTIKLLKKYK KEYWHENKYDRVIYNL SNN 






299 


GTNKVLRGIVQELNIEPIITTKGARHTYGSFLWHRGYDLGIIAKILGHKDISMI.IEVYGH 


358 






G NK ++ ++ + P RH++ S+L ++G DL ++K+LGH++ +++ ++VY H 




Sbjct: 


281 


GLNKTIK-VIAGRKVHP HSLRHSFASYLIYKGIDLLTVSKLLGHENLNVTLBCVYAH 


335 




359 


TLEEKIQEEYNEIKQLW 375 








L+E QE + + 




Sb j ct : 


336 


QLKEMEQEKNDVIRKIF 352 





There is also homology to SEQ ID 578. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
40 vaccines or diagnostics. 

Example 550 

A DNA sequence (GBSx0589) was identified in S.agalactiae <SEQ ID 1749> which encodes the amino 
acid sequence <SEQ ID 1750>. Analysis of this protein sequence reveals the following: 

Possible site: 54 
45 »> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 2710 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

50 bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
55 vaccines or diagnostics. 
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Example 551 

A DNA sequence (GBSx0590) was identified in S.agalactiae <SEQ ID 1751> which encodes the amino 
acid sequence <SEQ ID 1752>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 2534 (Affirmative) < auco 

bacterial membrane Certainty=0 .0000 (Not Clear) < auco 

bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 



Query: 


65 


Sbjct: 






124 


Sbjct: 


58 




184 


Sbjct: 


115 


Query: 


243 


Sbjct: 


173 


Query: 


303 


Sbjct: 


229 



ID+ W R E++ ++A+++A 



R+W N++ + R++ 



G K T D + I Q+ 

GASKLTKADLQVINQF 251 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Example 552 

A DNA sequence (GBSx0591) was identified in S.agalactiae <SEQ ID 1753> which encodes the amir 
acid sequence <SEQ ID 1754>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0 .2700 (Affirmative) < suco 
bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside Oertainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens f 
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Example 553 

A DNA sequence (GBSx0592) was identified in S.agalactiae <SEQ ID 1755> which encodes the amino 
acid sequence <SEQ ID 1756>. Analysis of this protein sequence reveals the following: 

o N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3121 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside Certainty=0 . 0000 (Not Clear) c suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1757> which encodes the amino acid 
sequence <SEQ ID 1758>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2913 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty-^0 . 0000 (Not Clear) < suco 

Aii alignment of the GAS and GBS proteins is shown below: 
Identities = 19/52 (36%) , Positives = 33/52 (62%) 



F NL L ++ I Q+++ N+L I K +1+ Y K ++ PT N+ K+A++F 
Sbjct: 15 FSTNLNMLMAKKNIKQIDIHNKLGIPKSTITGYVKORSLPTAGNVQKIiADFF 66 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 554 

A DNA sequence (GBSx0593) was identified in S.agalactiae <SEQ ID 1759> which encodes the amino 
acid sequence <SEQ ID 1760>. Analysis of this protein sequence reveals the following: 

■J- term signal seq. 

Final Results 

bacterial outside Certainty=0 .3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAA9B584 GB:L44593 ORF536; putative [Lactococcus phage BK5-T] 
Identities = 248/532 (46%) , Positives = 359/532 (66%) , Gaps «. 16/532 (3%) 

Query: 1 MNFIEQISENNQFPIIFVGSGITQRYFFJ3APTWEKLLKDIWLELFDEESYYAK- -AFELR 58 

MNFIE I +NNQFPIIFVGSG+T+RYF+N WE+LL ++W + +E+++Y + FE 
Sbjct: 1 MNFIENIKDNNQFPIIFVGSGVTKRYFKNGLK^QLLLELW^VEEEKAFYTQYHVFENL 60 

Query: 59 ERFEN NDFDIYTNLASLLEKEVSKAFINGNIQVDNLDLKTAYELNISPFKQLVAN 113 

+ +N +F+I +A +LE++++ AF + + +DNL L A+ +ISPF+Q +AN 

Sbjct: 61 LKSKNLSKSDKEFEINLtWGILEEKIN^FYSDELNIDNLTLAQAHTEHISPFRQCIAN 120 

Query: 114 RFSNLKIREEKIEEIKQFSQMLSKARI I ITTNYDNFIEECLKTINVSVKINVGNKGLFLK 173 
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Sbjct: 


121 


Query: 


174 


Sbjct: 


181 


Query: 


234 


Sbjct: 


241 


Query: 


294 


Sbjct: 


301 


Query: 


354 


Sbj Ct: 


361 






Sbjct: 


421 




469 


Sbjct: 


481 



S+DYGELYKIHG+V + +TI IT EDY+ N +K AL+NAKILSNL ESPILF+GYSLTD+ 



NIR+LLT ++EN P++ISE+A +IGWEY PD 1+ +VS++PDL ++Y+ + TDN+ 



I YRL I SKINQGFLPSEI AKYENVFRKI I E VKGE S KDLXTVLTS YEDLANLTEDE IRSKNI 3 53 
IY IS+I QG+LPSEIAK+E FRKIIEVKG+ K+L TVLTS+ D++ + +E+++KNI 
IYDEISQIEQGYLPSEIAKFEGAFRKIIEVKGKEKELDTVLTSFIDISKINTEELKNKNI 360 



3NKYTENI KKRLSKEEELSLDD FTS S I GVPLL - -HSKTLERQTEIVGILE-ADV 468 
+ E++K R+S E + ++ L + L + + I ++ ++V 

Sbjct: 421 GS I PNDLVQEVESLKTRI SNFPES I VRT YS I KANKDLAKKYLPYLNKTSTIEDVMSLSNV 480 

'FIATHIKNFPKEELFLLVEKI ID EG I FETSRRRFLKAFDLL 516 

FI IF EEL + K ID +GI T R+ + ++ ++ 
FILFKIDKFKVEELKDFIVKNIDMGEGKGISSTLYRKIVMSYSII 532 

A related GBS gene <SEQ ID 8599> and protein <SEQ ID 8600> were also identified. Analysis of thi 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 8 
McG: Discrira Score: 1.55 
GvH: Signal Score (-7.5): 0.27 

Possible site: 54 
»> Seems to have a cleavable N-term signal seq. 
ALOM program count: 0 value: 2.44 threshold: 0.0 
PERIPHERAL Likelihood = 2.44 214 
modified ALOM score: -0.99 

*** Reasoning Step: 3 

Final Results 

bacterial outside --- Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

47.3/69.1% over 531aa 

Lactococcus lactis 

EGAD|36707| hypothetical protein Insert characterized 

GPj 928833|gb]AAA98584.l] | L44593 0RF536; putative {Lactococcus lactis phage BK5-T} Insert 
characterized 

PIR|T1326l|T13261 hypothetical protein 536 - phage BK5-T Insert characterized 
ORF00184 (301 - 1848 of 2154) 

EGAD|36707|38110(1 - 532 of 536) hypothetical protein {Lactococcus 

lactis}GP| 928833 |gb|AAA98584.l| |L44593 0RF536; putative {Lactococcus lactis phage BK5- 
T}PIR|T1326l|T13261 hypothetical protein 536 - Lactococcus lactis phage BK5-T 
%Match =32.3 

%Identity =47.2 %Similarity =69.0 

Matches = 247 Mismatches = 155 Conservative Sub.s = 114 

126 156 186 216 246 276 306 336 

RMLILKAFYLAKFLKYYC*KK*CGTKRGQLYFRVYGLIIKINK^ 

=1111 I :||| 
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FPIIFVGSGITQRYFENAPTWEKXjLKDIWLELFDEESYYAK- -AFE--LRER FEMMDFDIYTNLASLLEKEVSKAFI 

llllllllhhllhl 11 = 11 = = l = =l===l = II 1= = = = 1 = 1 =1 =11 = : = : II 

FPIIFVGSGOTKRYFKNGLKOTEQLLLELVWLVEEEKAFYTQYHVFENLL;<Sia^LSKSDKEFEIHLMMAGILEEKIMNAFY 



NGNIQVDNLDLKTAYELNISPFKQLVANRFSNLKIREEKIEEIKQFSQMLSKARIIITTNYDNFIEECLKTINVSVKINV 
= = =111 I |: =1111=1 =11 Mil == III 11=11 III 1=11111111111= 111=1=11 
SDEkNIDNLTLAQAHTEHISPFRQCIAOTFSNLDRKXGFDEEIISFSKML^ 



825 855 885 915 945 975 1005 1035 

gnkglflkssdygelykihgtvddastititkedyekn\t:ksalinakilsnlvespilflgysltdenirklltdfaen 

II 111 = 11 = 1111111111 = 1 = =11 II 111= I =1 Ihllllllll 111111 = 111111 = 111 = 111 =:|l 

gnsglfvksndygelykihgsvknpnticitsedyiq^sklalvnakilsnltespilfigysltdknirelltsysen 



1055 1095 1125 1155 1185 1215 1245 1275 

spfdisesaqkigweylpdsesietwsslpdlsvyysclktdnftniyrliskinqgflpseiakyenvfrkiievkg 
l = = lll = l =111111 II != =ll = = lll = = l= = 111= II 11 = 1 11 = 1111111 = 1 IIIIIMII 
lpyeiseaaarigweytpdkieiqdivsnipdlgihytkistdnyickiydeisqieqgylpseiakfegafrkiievkg 



15S9 1599 1S29 1683 1710 1740 

YMFAMSEY--ISKDSNKYTENIKKRLSKEEELSLDDFTSSIGVPLLHS--KTLERQTEIVGILE-ADVPDNVRYNFIATH 
=1 = = I I = l==l 1=1 I = == I I = = I == ==ll = II 

HMGVIESWGSIPM)LVQEVESLKTRISNFPESIVRTYSI!^KDLAK1CYLPYMKTSTIEDVMSLSNVPLYNKLRFII,FK 



1770 1818 

IKNFPKEEIiFLLVEKIID EG! 

I I III == I II =11 11===::= | 

IDKFKVEELKDFIVKNIDMGEGKGISSTLYRKIVMSYSIITEGI 



No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 8600 (GBS142) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 18 (lane 5; MW 54kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 33 (lane 6; MW 79.8kDa). 

The GBS142-GST fusion product was purified (Figure 195, lane 3) and used to immunise mice. The 
resulting antiserum was used for Western blot (Figure 249). These tests confirm that the protein is 
immunoaccessible on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 555 

A DNA sequence (GBSx0594) was identified in S.agalactiae <SEQ ID 1761> which encodes the amino 
acid sequence <SEQ ID 1762>. This protein is predicted to be integrase. Analysis of this protein sequence 
reveals the following: 
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d N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2933 {Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAA98585 GB:L44593 integrase [Lactococcus phage BK5-T] 
Identities = 124/332 (32%), Positives = 202/382 (52%), Gaps = 21/382 (5%) 

1 MATYRQRGKKKLWDYRI FNEKSELVA- SGSGFKTKREAMNEAMRIE QQKLLVNSI SS 56 

MATY++RGK W Y I K h + GF TK +A RAM IE ++ +V+ I 
1 MATYQKRGKT- -WQYSISRTKQGLPRLTKGGFSTKS3AQAEAMDIESKLKKGFIVDPIKQ 58 

57 DITLYDL-MFEWYSLIIKPSNIMTTKNKYFTRGSVIRKLFGNQKOTKIKHSAYQRKLNT 115 
+1+ Y WEY K+ + ET Y ++ N S+YQR LN 

Sbjct: 59 E I SEYFKDWMELY KKNAIDEMTYKGYEQTLKYLKTYMPNVLISEITASSYQRAMK 114 

Query: 

Sbjct 

Query 

Sbjct 

Query; 

Sbjct 

Query; 

Sbj ct 

Query: 

Sbjct: 



175 YKKVTSYLENNLD- - YSNSIVYYLLLVLFKTGLRVGEALALTWDDVNFEDLEIKTYR- -R 230 

YK+++ Y N L+ YS+ 4- +++ + TG+R EA L WDD++F + IK R 
173 YKQLVDYFKNRLNPNYSSPTMLFIISI TGMRASEAFGLVWDDIDFNNNTIKCRRTWN 229 



291 VSTNSAINKSLKNVLKIIjNINSKMTATGARHTYGSYLLAKGTOI^WVARLMGHKDITQLL 350 

+ T SA+ +L + LK LNI++ +T G RHT+ S LL GVDI V++ +GH + 
290 IITLSALQNTLDHALKKLNISTPIiTIHGLRHTHASVLIjYHGVDIMTVSKRLGHASVAITQ 349 

351 ETYGHVLTEVINKEYETVRSLV 372 

+TY H++ E+ NK+ + 4- L+ 
350 QTYIHIIKELENKDKDKIIELL 371 



There is also homology to SEQ ID 578. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 556 

A DNA sequence (GBSx0595) was identified in S.agalactiae <SEQ ID 1763> which encodes the amino 
acid sequence <SEQ ID 1764>. Analysis of this protein sequence reveals the following: 



no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1603 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10209> which encodes amino acid sequence <SEQ ID 
1021O was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB07266 GB:AP001519 unknown conserved protein in others 
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= 39/71 (54%) , Gaps - 6/71 (8%) 

Query: 37 WWDIDNLQELLGIGRSKLINDILLNPDIKKEVDLSINPNGFIVyPKGKGSRYKIIATK-- 94 

WW + +L+E G L +ILL+P K +D I GF+ YP+ KG R+ +A+ 

Sbjct: 4 WWSMQDLKERTGYSEDWLKENILLHPRYKPMLD- - IENGGFVYYPEKKGERWCFIASSME 61 

Query: 95 --ARKYFEDNF 103 

+KYF+D F 
Sbjct: 62 EFLKKYFKDIF 72 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 557 

A DNA sequence (GBSx0596) was identified in S.agalactiae <SEQ ID 1765> which encodes the amino 
acid sequence <SEQ ID 1766>. Analysis of this protein sequence reveals the following: 
Possible site: 14 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -3.88 Transmembrane 12 - 28 ( 11 - 29) 

Final Results 

bacterial membrane --- Certainty=Q . 2550 (Affirmative) < suco 
bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 
bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB99663 GB:US7604 chromosome segretation protein (smcl) 
[Methanococcus jannaschii] 
Identities = 53/210 (25%) , Positives = 95/210 (45%) , Gaps = 33/210 (15%) 

Query: 20 IFTNVGVLISNSRDNKAIQRELELLEEGQEKLVDEFSKISTNQYDICYV LI 69 

+F +G+L N + + + + + K++DE S 1+ K LI 

Sbjct: 133 LFRRLGLLGDNVISQGDLIiKIINISPIERRKIIDEISGIAEFDEKKKKAEEELKKARELI 192 

Query: 70 Q SNLSNNIEKNKQELVQKNSYVK- -EDTKYIRDEMLIEKKSK EEVYNHV 116 

+ S + NN++K K+E Y+K E+ K + ++++K S E + N + 

Sbjct: 193 EMIDIRISEVENNLKKLKKEKEDAEKYIKIiNEELKAAKYALILKCTSYLNVLLENIQNDI 252 

Query: 117 KNGDKLIEI^FAIffiLILKFGHVSRENQKLGLKvNSLEEKIvnLSNQPKNDEISKLRKSI 176 

KN ++L NE + K E+ E + L L++N+ I++ N+ N+E+ +L KSI 
Sbjct: 253 KNLEEL KNEFLSKVREIDVEIENLKLRLNN IINELNEKGNEEVLELHKSI 302 

Query: 177 SSFERELSRFEDVGYSEAEEIKSTLRRILN 206 

E E+ + V S E+K I N 

Sbjct: 303 KELEVEIENDKKVLDSSINELKKVEVEIEN 332 

No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 1766 (GBS315) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 42 (lane 4; MW 26.7kDa) and in Figure 239 (lane 5; MW 41kDa). It was also 
expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 47 
(lane 5; MW 52kDa). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 558 

A DNA sequence (GBSx0597) was identified in S.agalactiae <SEQ ID 1767> which encodes the amino 
acid sequence <SEQ ID 1768>. This protein is predicted to be surface protein. Analysis of this protein 
sequence reveals the following: 

Possible site: 26 

>» Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -7.70 Transmembrane 229 - 245 ( 226 - 248) 



Final Results 

10 bacterial membrane -— Certainty=0. 4079 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

15 >GP:CAA47097 GB:XS6468 orf iota [Streptococcus pyogenes] 

Identities = 90/262 (34%) , Positives = 138/262 (52%) , Gaps = 26/262 (9%) 

Query: 4 VKVLSLITV-SGLFLMAGNLSASADWISGGDTIMLSGVDAGVSDSIMPPPSSINPV- -- 59 
+K L+L+T+ S L++ + + AD S D +L+ D V P + ++PV 

20 Sbjct: 1 KKKLALLTLFSTTLLVSAPIVSFADETASSSDINILADDDPWPVEPTDPTTPVDPVDPV 60 

Query: 60 TDTTEPSAPTPSTDPI - -TDTTEPSAPTPSTDPI - -TDTTEPSAPTPST 104 

T+ TEP+ PT T+P T+ TEP+ PT T+P T+ TEP+ PT T 
Sbjct: 61 DPVDPVDPVDPTEPTEPTEPTEPTEPTEPTEPTEPTEPTEPTEPTEPTEPTEPTEPTEPT 120 

25 

Query: 105 DQTTGTTDSS-TPSSSTTNPVDGITDNGTKPNAGIDKPSTNKPSDHSESSI - -KPVTKPT 161 

+ TT + TSTP + T+P + +PS +E ++ KPV 

Sbjct: 121 EPTEPTEPTEPTEPSKPTEPTE - - PSKPTEPTEPTEPSKPTEPSKPTEPTVPNKPVDTNP 178 

30 Query: 162 INQPITTVTGDQVIGTQDGKVLVQTPSGTQLK-DAAEVGGNVQKDGTVAIKKSDGKIEVL 220 

I P+ T TG ++ +D K ++Q GT K +A E+G +VQKDGTV +K SDGK++VL 
Sbjct: 179 IENPVNTDTGVVIVAvEDSKPIIQLArX3^^^ 238 

Query: 221 PKTGEGKTI - FTIVGLLLIAGA 241 
35 PKTGE I +++G L++ G+ 

Sbjct: 23 9 PKTGETANIALSVLGSLMVLGS 260 

There is also homology to SEQ ID 760. 

SEQ ID 1768 (GBS141) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
40 extract is shown in Figure 19 (lane 4; MW 35kDa). The GBS141-His fusion product was purified (Figure 
194, lane 3) and used to immunise mice. The resulting antiserum was used for FACS (Figure 295), which 
confirmed that the protein is immunoaccessible on GBS bacteria. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Example 559 

A DNA sequence (GBSx0598) was identified in S.agalactiae <SEQ ID 1769> which encodes the amino 
acid sequence <SEQ ID 1770>. Analysis of this protein sequence reveals the following: 

a cleavable N-term signal seq. 

- Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0 .0000 (Not Clear) < suco 
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The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes, 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
5 vaccines or diagnostics. 

A related GBS gene <SEQ ID 8601> and protein <SEQ ID 8602> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 4 
McG: Di scrim Score: 14.39 
10 GvH: Signal Score (-7.5): -1.23 

Possible site: 18 
>» Seems to have a cleavable N-term signal seq. 
ALOM program count: 0 value: 8.96 threshold: 0.0 
PERIPHERAL Likelihood = 8.96 104 
15 modified ALOM score: -2.29 

*** Reasoning Step: 3 

Final Results 

20 bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

SEQ ID 1770 (GBS 17) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
25 extract is shown in Figure 4 (lane 2; MW 24kDa). 

The His-fusion protein was purified as shown in Figure 189, lane 10. 

Example 560 

A DNA sequence (GBSx0599) was identified in S.agalactiae <SEQ ID 1771> which encodes the amino 
acid sequence <SEQ ID 1772>. Analysis of this protein sequence reveals the following: 

30 Possible site: 23 

>» Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside --- Certainty=0. 3000 (Affirmative) < suco 

35 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS gene <SEQ ID 10779> and protein <SEQ ID 10780> were also identified. A further related 
GBS nucleic acid sequence <SEQ ID 10957> which encodes amino acid sequence <SEQ ID 10958> was 
40 also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 1772 (GBS643) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 129 (lane 2-4; MW 79kDa) and in Figure 186 (lane 2; MW 79kDa). It was also 
45 expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 129 
(lane 5-7; MW 54kDa) and in Figure 176 (lane 5; MW 54kDa). 

GBS643-GST was purified as shown in Figure 236, lane 7. 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 561 

A DNA sequence (GBSx0600) was identified in S.agalactiae <SEQ ID 1773> which encodes the amino 
acid sequence <SEQ ID 1774>. Analysis of this protein sequence reveals the following: 

Possible sit 



> to have : 



3 N-terrainal signal sequence 



- Final Results 

bacterial cytoplasm Certainty=0 . 5815 (Affirmative; 

bacterial membrane — Certainty=0 . 0000 (Not Clear) . 
bacterial outside Certainty=0 . 0000 (Not Clear) • 



The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Example 562 

A DNA sequence (GBSx0601) was identified in S.agalactiae <SEQ ID 1775> which encodes the amino 
acid sequence <SEQ ID 1776>. This protein is predicted to be membrane protein. Analysis of this protein 
sequence reveals the following: 



Possible site 


33 














Lave no N-terrainal signal sequence 










INTEGRAL 


Likelihood = - 13 . 32 


Transmembrane 


311 


327 




332) 




Likelihood =-10.46 


Transmembrane 


293 


309 


282 


310) 




Likelihood = -8 . 55 


Transmembrane 


390 


406 


388 




INTEGRAL 


Likelihood = -7.64 


Transmembrane 


49 


65 


40 


69) 


INTEGRAL 


Likelihood = -5.68 


Transmembrane 


100 


116 


98 


122) 


INTEGRAL 


Likelihood = -4.35 


Transmembrane 




146 


127 


148) 


INTEGRAL 


Likelihood = -3.88 


Transmembrane 


344 


360 


342 





• Final Results 

bacterial membrane Certainty=0 . 6328 (Affirmative) < 

bacterial outside Certainty=0 . 0000 (Not Clear) < s 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < £ 



The protein has homology with the following sequences in the GENPEPT database: 



Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 
Sbjct: 
Query: 



13 FAKVKDVDIFALKAYMEITH-GAETGAQSILLDVFVNFPFFLLNLIVGLFSVILRFFENF 71 

FAK+K VDIF+LK+YME T+ G+ GA ++ ++FVN FF+LN +VG FS+++R E 
5 FAKLKGVDIFSLKSYMEPTNFGSFNGAVTO;INELF\'NLF?FILNAVVGFFSLLIRILEKI 54 

72 SLYDTYKQTVYHSSQKLWENLSGN--GSYTS-SLLYLLVAISAFSIFISYLFSKGDFSKR 128 

LY TYK V+H + +W +G+ G+ T+ SL+ L+ + AF +F Y FSKG FS+ 
65 DLYATYKTYVFHGASSIWHGFTGSKTGNITNKSLVGTLLLVLAFYLFYQYFFSKGSFSRT 124 

129 LIHLFWIILGMGYFGTIQSTSGGIYILDTVHQLAGSFSDAVTNLSLDNPSGGKTKITQK 188 

L+H+ +V++L +GYFGT+ TSGG+Y+LDTV+ ++ + + + +D KI + 

125 LLHVCLVLLLALGYFGWAGTSGGBTLLDTVI^SKDVTKKIAGIKVDYAKDKSIKIGK- 183 
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Sb j ct : 


184 


Query: 


249 


Sbjct: 


243 


Query: 


309 


Sbjct: 


303 


Query: 


369 


Sbjct: 


363 


Query: 


429 


Sb j ct : 


422 


Query: 


479 


Sbjct: 


479 


Query: 


539 


Sbjct 




Sbjct: 


599 




659 


Sb j Ct : 


620 



Y D+LG+ A + EKNRW+SA+ D+++ + YVI KI EA +IAVP+IDIQL+ 



A +LV+ ++ +FP+ LL+SF+PRMQ+^+P VLKVMFG + FPA+ LTL++FY 4 



++F LL++V +G +++ IW++K L+ I+GS+A V 



- - INQSVDKINEKAENLGITPKSIYERAHDMSSLAMMGAGYGVGTMMNAQ- - - DN 478 
+ VKEA + P A++ + GAG+G G MMNA+ N 

■ -PTRSIATAQKLGNFTIAGAGFGTGVMMNAfCSHFQN 478 



SPLKQRI<LNKLEGELSQFNSDVSMTKNHGKNAFEKGFNASKTKEWKQHNLERQSKVLEE 658 
SP KQ ++N LE L + +M K G NAF + + + T++ + + N+ER+ ++ + 
SPFKQHRINTLERRLDAYKDPQAMYKAQGSNAFTRAYRKTLTRDDKIRANIERRDRLTQR 619 



No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 563 

A DNA sequence (GBSx0602) was identified in S.agalactiae <SEQ ID 1777> which encodes the amino 
acid sequence <SEQ ID 1778>. This protein is predicted to be conjugative protein. Analysis of this protein 
sequence reveals the following: 

5 N-terminal signal sequence 

45 Final Results 

bacterial cytoplasm --- Certainty=0. 3714 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < guco 

50 The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB70617 GB:AJ243106 conjugative protein [Streptococcus thermophilus] 
Identities = 515/757 (68%), Positives = 612/757 (80%), Gaps = 1/757 (0%) 

Query: 1 MSDFERDIADDVKELGLETLDFTVX)TLTHE^1EIPYQFDWLIGVDLGKGQYNANIKEFIYN 60 
55 M DF IADD +ELG E L +TVD LT EMEIPYQFDW+IGV L K + A +K+ Y 

Sbjct: 78 MRDFSEALADDSRELGEELLLYTVDRLTDEMEIPYQFDWIGVTLRKQNHGATVKDLAYE 137 

■ Query: 61 QFESIASNFASIAGYEVEVDEDl'?YKEHSEEELL\rYSLLSTLKAKRLTDVDLFYYQRMQFL 120 
F + A GYE + WY ++ +E ++ S L+AKRLT+ +LFYYQRMQ+L 
60 Sbjct: 138 SFNEFSEKIAKGLGYEYALSPTWYDDYRSDEFXIFQAFSVLRRKRLTNEELFYYQRMQYL 197 
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Query: 


121 


Sb j ct : 


198 


Query: 


181 


Sbjct: 


258 


Query: 


241 


Sbjct: 


318 


Query: 


301 


Sbj Ct : 


378 


Query: 


361 


Sbjct: 


438 


Query: 


421 


Sbjct: 


498 


Query: 


481 


Sbjct: 


558 


Query: 


541 


Sbjct: 


618 


Query: 


601 


Sbjct: 


678 


Query: 


660 


Sbjct: 


738 


Query: 


720 


Sbjct: 


798 



RY+PH K EV+ANR+ N+TDTLIK L+GGFL+LES YGSSFV++LPVG+F FNGFHL 



GE VQR++FPVELR KAEFID K+ G MGRSNTRY IM+EA NT+TVQQD+I+MG+ £ 



LKDLMKKVGNKE+IIEYG YL+V+ SS+NQL+QRR IL+YFDDM V + EAS D PYLF 



QALLYG++LQK TR W H+VTARGFSELM FTNT SGNRIGWYIGRVDN + 



SKN+VL+NATV NKED+AGK+TKNPH+IITGATGQGKS+LAQ+IFL A QNV+ I 



+DPKRELR HY +V++ PE+AR++P RKKQI+ NFVTLDSS+ NHGVLDPIV+LDKE 



AKNML +LL+ ++ +DQ TA+TEAI+ 4+ +R AGE VGF V+E L ++ S E 



- SVGRY +1+ NSILELAFSDG GL+YE RVT+LEV +L LPKD S ISDHE US 



IALMFALGAFC HFGHR+++E T+E FD3AW+LM+S+EGKAVIK+MRR+GRSK N J. L+ 



+QSVHDAENDDDTTGFGTIF+FYEKSEREDIL HV LEVT NLEWIDMMISGQCLYYDV 



YGNLNMIS+HN+ DID LLKPMK TVSS LENKYAS 



No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Example 564 

A DNA sequence (GBSx0604) was identified in S.agalactiae <SEQ ID 1779> which encodes the amino 
acid sequence <SEQ ID 1780>. This protein is predicted to be ISL2 protein. Analysis of this protein 
sequence reveals the following: 

Possible site: 26 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3469 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 
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>GP:CAC18595 GB:AJ278419 IS1381 transposase [Streptococcus pneumoniae] 
Identities = 110/125 (88%) , Positives = 119/125 (95%) 

Query: 81 MNYE^SKQLTDWPIO^LVGVQRTTFEEM]^VLKTAYQRKH2UCGGRTPKLSLEDLLMATLQ 140 

MNYEASKQLTD RFICRLVGVQRTTFEEMLAVLKTAYQ KHAKGGR PKLSLEDLLMATLQ 
Sbjct: 1 MiraaSKQLTDARFKRLVGVQRTTFEEMIAVLKTAYQLKHaKGGRKPKLSLEDLLMATLQ SO 

Query: 141 YMREYRTYEQIAADFGIHESNLIRRSQWVESTLIQSGFTISKTHLSAEDTVIVDATEVKI 200 

Y+REYRTYE+IAADFG+HESNr,+RRSOWVE TL+OSG TIS+T LS+EDTV++DATEVKI 
Sbjct: 61 YVREYRTYEEIAADFGVHESNLLRRSQWVEVTLVQSGVTISRTPLSSEDTVMIDATEVKI 120 



Query: 201 NRPKK 205 
Sbjct: 121 NRPKK 125 



No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Example 565 

A DNA sequence (GBSx0605) was identified in S.agalactiae <SEQ ID 1781> which encodes the amino 
acid sequence <SEQ ID 1782>. Analysis of this protein sequence reveals the following: 

Possible site: 61 



- Final Results 

bacterial membrane Certainty=0. 6031 (Affirmative) • 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < t 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < s 



The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 566 

A DNA sequence (GBSx0606) was identified in S.agalactiae <SEQ ID 1783> which encodes the amino 
acid sequence <SEQ ID 1784>. This protein is predicted to be Cag-W. Analysis of this protein sequence 
reveals the following: 

Possible site: 59 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -3.82 Transmembrane 50 - 66 ( 49 - 66) 
INTEGRAL Likelihood = -3.72 



Final Results 

bacterial membrane --- Certainty=0. 2529 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 567 

A DNA sequence (GBSx0607) was identified in S.agalactiae <SEQ ID 1785> which encodes the amino 
5 acid sequence <SEQ ID 1786>. Analysis of this protein sequence reveals the following: 

PoGsible site: 55 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -7.80 Transmembrane 36 - 52 ( 32 - 60) 

10 Final Results 

bacterial membrane Certainty=0 .4121 (Af firmative) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB12298 GB:Z99106 similar to transposon protein [Bacillus subtilis] 
Identities = 68/339 (20%), Positives = 133/339 (39%), Gaps = 49/339 (14%) 



Query: 


16 


KKEEGGKQPKTKEVKQRTANFIV- -YGILGLLFIVGPFGSLRAIGLSNQVQHLKETVIAV 


73 






K+ E ++ K K + R+ V + +G L + L +1 +Q+ +K+ 




Sbjct: 


24 


KRIERPEKDKQKVPRDRSKLIAVTLWSCVGSLLFICLIAVLLSINTRSQLNDMKDETNKP 


83 


Query: 


74 


EKKSKHKKTDDSLDISRIQYYMNNFVYYYINYS--QDTADQRKTELENY YSF 


123 






K K +++++++ F+ Y+N Q++ ++R LE+Y + 




Sb j ct : 


84 


TNDDKQK ISvTAAEMFLSGFINEYMNVKNDQESIEKRMQSLESYMVKQEDNHFED 


138 


Query: 


124 


STASMTDDVRKSRTLQTQRLISVEKEKDYYIALMRIGYEV- 


163 






D ++ R L+ L +V++ + ++ YE 




Sb j ct : 


139 


EERFNVDGLKGDRELKGYSLYNVKEGDK1JSLFQYKVTYENLYPVEKEVEKEVKDGKKKKK 


198 




1S4 


DKKSYQMNLAVPFQMQRGLLAIVSQPYTVAEDLYLGKSKAFEKKTLDQVKEL 


215 






+K QM L +P + A+ + PY +Y K K + E 




Sbjct: 




VT<EKVKTNEKYEKQMLLNIPVTNKGDSFAVSAVPYFT--QIYDIjKGDIAFKGKEETRDEY 


256 




216 


SKEQVSSIQKFLPVFFNKYALINKTDLKLLMKTPELMGKGFKVSELDLNNAIYYQEKKHQ 


275 






+ E+ SI+ FL FF KYA K ++ +MK PE + E + + ++ KK 




Sbjct: 


257 


AGEKKESIESFLQNFFEKYASEKKEEMVYMMKKPEALEGNLLFGE-- VQSVKIFETKKGF 


314 




276 


WQLSVTFEDLVTGGTRSENFTLYLFKADNGWYVEEMYH 3 14 








V +V F++ +E F+L + + +YV ++ H 




Sbjct: 


315 


EVTCAVRFKEKENDIPVNEKFSLEITENSGQFYVl'IKLKH 353 





No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 1786 (GBS333d) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total 
45 cell extract is shown in Figure 145 (lane 8-10; MW 58kDa). It was also expressed in E.coli as a His-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 145 (lane 11 & 13; MW 33kDa), in 
Figure 182 (lane 2; MW 33kDa) and in Figure 185 (lane 3; MW 58kDa). 

GBS333d-GST was purified as shown in Figure 236, lane 2. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
50 vaccines or diagnostics. 
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Example 568 

A DNA sequence (GBSx0608) was identified in S.agalactiae <SEQ ID 1787> which encodes the amino 
acid sequence <SEQ ID 1788>. Analysis of this protein sequence reveals the following: 

Possible site: 54 
5 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 4177 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 [Not Clear) < suco 

10 bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB38326 GB:Y17736 hypothetical protein [Streptomyces 
coelicolor A3 (2) ] 
15 Identities = 45/80 (56%) , Positives = 56/80 (69%) 

Query: 4 FTEEAWKDWSWQQEDKKILKRINRLIEDIKRDPFEGIGKPEPLKYHYSGAWSRRITEEH 63 

FT W+DYV W + D+K+ KRINRLI DI RDPF+G+GKPEPLK SG WSRRI + H 
Sbjct: 5 FTSHGWEDYVHWAESDRKVTKRINRLIADIARDPFKGVGKPEPLKGDLSGYWSRRIDDTH 64 

20 

Query: 64 RLIYMIEDGEIYFLSFRDHY 83 

RL+Y D ++ + R HY 
Sbjct: 65 RLVYKPTDDQLVIVQARYHY 84 

25 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 569 

A DNA sequence (GBSx0609) was identified in S.agalactiae <SEQ ID 1789> which encodes the amino 
30 acid sequence <SEQ ID 1790>. Analysis of this protein sequence reveals the following: 

Possible site: 53 

>>> Seems to have no N-terminal signal sequence 

Final Results 

35 bacterial cytoplasm — - Certainty=0. 5669 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial, outside --- Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 1021 1> which encodes amino acid sequence <SEQ ID 
40 1 02 1 2> was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAD17306 GB:AF121418 putative Phd protein [Francisella 
tularensis subsp. novicida] 
Identities = 26/84 (3 0%) , Positives = 45/84 (52%) 

45 

Query: 4 MEAIVYSHFRNNLKDYMKIO/NDEFEPLIWKKNPDENI WLSQDSWESLQETIRLMENDY 63 

M+ + YS FRN L D M +V P+IV + E +V++S + +++ +ET LM + 

Sbjct: 1 MQTVNYSTFRNELSDSMDRVTKNHSPMIVTRGSKKEAWMMSLEDFKAYEETAYLMRSMN 60 

50 Query: 64 LSHKVINGISQVKEKQVTKHGLIE 87 

++ N I +V+ + LIE 

Sbjct: 61 NYKRLQNSIDEVESGLAIQKELIE 84 



No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 570 

A DNA sequence (GBSx0610) was identified in S.agalactiae <SEQ ID 1791> which encodes the amino 
acid sequence <SEQ ID 1792>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 



Final Results 

10 bacterial cytoplasm Certainty=0. 2407 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

15 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 571 

A DNA sequence (GBSx0611) was identified in S.agalactiae <SEQ ID 1793> which encodes the amino 
20 acid sequence <SEQ ID 1794>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1274 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < succ> 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10213> which encodes amino acid sequence <SEQ ID 
10214> was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB60015 GB:U09422 ORF18 [Enterococcus faecalis] 
Identities = 41/140 (29%) , Positives = 73/140 (51%) , Gaps = 3/140 (2%) 

Query: 23 FPvEMSELKLALGLREEDDLEYIIADSDCQL-LKEHDSrEMINQFVELVENVDSELVKAV 81 

FP++ E+K +GL +E + EY I D + + E+ SI +N+ E+V + EL + 
Sbjct: 26 FPIDFEEVKEKIGLNDEYE-EYAIHDYELPFWDEYTSIGELNRLWEMVSELPEELQSEL 84 

Query: 82 HQVIGYTASDFVDYDFNFGDCCLLSD\m'RRELGEYYFDELGVQGVGKEALEMYFDHEAY 141 

++ + +S + + D + SD ++ YY +E G G +L+ Y D++AY 

Sbjct: 85 SALLTHFSS - IEELSEHQEDI I IHSDCDDMYDVARYYIEETGALGEVPASLQNYIDYQAY 143 

Query: 142 GRDIDLESQGGFSDYGYVEI 161 

GRD+DL +++G EI 

Sbjct: 144 GRDLDLSGTFISTNHGIFEI 163 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 572 

A DNA sequence (GBSx0612) was identified in S.agalactiae <SEQ ID 1795> which encodes the amino 
acid sequence <SEQ ID 1796>. Analysis of this protein sequence reveals the following: 

Possible site: 31 
5 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1366 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
15 vaccines or diagnostics. 

Example 573 

A DNA sequence (GBSx0613) was identified in S.agalactiae <SEQ ID 1797> which encodes the amino 
acid sequence <SEQ ID 1798>. Analysis of this protein sequence reveals the following: 

Possible site: 41 
20 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1484 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

25 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

, Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
30 vaccines or diagnostics. 

Example 574 

A DNA sequence (GBSx0614) was identified in S.agalactiae <SEQ ID 1799> which encodes the amino 
acid sequence <SEQ ID 1800>. This protein is predicted to be abortive phage resistance protein. Analysis of 
this protein sequence reveals the following: 

35 Possible site: 58 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2205 (Affirmative) < suco 

40 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10215> which encodes amino acid sequence <SEQ ID 
10216> was also identified. 

45 The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB53710 GB:U94520 abortive phage resistance protein 
[Lactococcus lactis] 
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Identities = 131/499 (26%) , Positives = 210/499 (41%) , Gaps = 97/499 (19%) 

Query: 3 MFSKIEFKNFMSFSNLT FDLLNRGKCKDI IAIYGENGSGKTN 44 

M F+NF4SF L+ D+ N K + IYG N SGK++ 

Sbjct: 1 MLWFRFENFLSFDKLSTFSI^PGKSRQHMEDLIELDIKNHQKLLKLSTIYGANASGKSS 60 

Query: 45 IVEAF KLLVL SLQSMESIiNENTRLQSLLKEQTNKE ENQKENFGDISEIL 93 

V+A K L++ L S N+NT SL + + E E++ ++G S IL 

Sbjct: 61 FVDAIGISKSLIIRGFYNGLVLSNSYKKKmTDNSLNETKFEYEIVIEDKVYSYG-FSVIL US 

Query: 94 DKISFFTTFKGIAI<NTHRIASEQ^ILKYYFNIEKDNGYYLLEYlfflE^MELVKEELVFKIK 153 

F + + N 4+ Y KDN YN N+E L + 

Sbjct: 120 SLKKFMSEWLYDITNDEKM IYTIDRKDN SYNINDEF LNLDEQ 161 

Query: 154 SNKGVHFSITNIDGLSQSLNKTIFKNTIFKDLTEQIEKYV7GKHTFLSIFN--NYCLEV- - 209 

SN + I + S + N +F N++ D + IE F +FN N LEV 

Sbjct: 162 SNNRISIYIDD SAMDNTQLFmSL-MDGKKTIESKDNSTIFKKVFITOFNNTLEVLG 216 

Query: 210 NEEF INEQVSINFQKWDEFDKIFIWSGNFRGPFHSTELLLK 251 

EEF + + + +N V+D N P E +L 

Sbjct: 217 PGDEARGSIASLTQEEEEFKEDLGKYLELNDTGVIDIVQVPVDNLSNV--PAKLQERILD 274 





252 


DISKGKIDKSEICEKLSYTEEIIYKYFSALYIDIKDVKYKQDAQGQEIKYELMIRKNI 


GGD 


311 






+14 I K +KE+ EI F+ + +++ Q+ Q 4EL K+ 


G 




Sbjct: 


275 


NITT-DIKKKKKER EDIEISFNTILMTSQNIYIIQNNDEQFEYFELKF-KHK 


NGT 


327 


Query: 


312 


LLDVPISLESQGTI<NLLDLLICV-FNNVLDGKICIVDEIDSGIHDLLMNSILNDLK-- 


GSV 


368 






L +S ES GT L++L V F+N D K+ 4+DEID 4H LL 4 4 K 


S+ 




Sbjct: 


328 


LYS - -LSEESDGTVRLIELFSVLFHN- -DEKVFVIDEIDRSLHPLLTYNFIESFKKQ 


iKSI 


383 




369 


NGQLIFTTHDTTLL- -KELSPSSAYFLNVDIKGNICVIISGNEADKKIGVNNNIiEI'CLY 




426 






N QLI TTH4 4L 4 L 4F44 + +GN + S E ++ +44 Y 


L-G 




Sbjct: 


384 


N-QLIVTTHEDYILNFELLRRDEWFVDKNF3GNSSMFSLEEFI05RF--DKDINTSYLNG 


440 




427 


FFGAVPDPLDIDFSDLFLD 445 










+G +P+ L FS+ D 






Sbjct: 




RYGGIPN-LSCLFSEFAKD 458 







No corresponding DNA sequence was identified in S.pyogenes. 

40 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 575 

A DNA sequence (GBSx0615) was identified in S.agalactiae <SEQ ID 1801> which encodes the amino 
acid sequence <SEQ ID 1802>. This protein is predicted to be repressor (rstR-1). Analysis of this protein 
45 sequence reveals the following: 

Possible site: 37 

»> Seems to have no N-tertninal signal sequence 

Final Results 

50 bacterial cytoplasm — Certainty=0. 3724 (Affirmative) < suco 

bacterial membrane -— Certainty=0. 0000 (Not Clear) < suco 
bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 

55 >GP:AAB84427 GB:AF027868 transcription regulator [Bacillus subtilis] 

Identities = 31/81 (38%) , Positives = 53/81 (65%) , Gaps = 2/81 (2%) 

Query: 9 QKLKELRKEKKLTQTELASKLNISQKSYSNVffiSGKAEPTLDNIIKIANILDvTVDYDLGR 68 
Q4L+4LRK KLT +LA K4 144 SY 4E4 4P LD 4+ LA 4 DV+VDY4LG 
60 Sbjct: 4 QRLRQLRKAHKLTMEQLAEKIGIAKSSYGGYEAESKKPPLDKLVILARLYDVSVDYILGL 63 
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Query: 69 SDNFSNTIVLSKNNMKSFSKR 89 

+D+ + + K+K F ++ 
Sbjct: 64 TDDPDPKV- -ERKNLKEFLEK 82 

5 

There is also homology to SEQ ID 1740. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 576 

10 A DNA sequence (GBSx0616) was identified in S.agalactiae <SEQ ID 1803> which encodes the amino 
acid sequence <SEQ ID 1 804>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

>» Seems to have no N-terminal signal sequence 

15 Final Results 

bacterial cytoplasm --- Certainty=0. 3607 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

20 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Example 577 

A DNA sequence (GBSx0617) was identified in S.agalactiae <SEQ ID 1805> which encodes the amino 
acid sequence <SEQ ID 1806>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 0564 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10217> which encodes amino acid sequence <SEQ ID 
10218> was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

to transposon protein [Bacillus subtilis] 
= 164/348 (46%) , Gaps = 28/348 (8%) 

Query: 81 SRLQVMIDYVRITLKDWDLEFFCRNFLHCMKEFQPFESKLMNYNHLWKRGDIWIFDFA 140 

S L M+DY+R++ K D++ m + +S Y +4- I +F A 

Sbjct: 26 SPLVSMVDYIRVSFK-THDVDRIIEEVLKLSE^DFMTEKQSGETGYVGTYELDYIKVFYSA 84 

Query: 141 DKHETGNFQITVQLSGRGCRQLELLMETEKFII'fflDWLSYLRNSYRDDMNVTRFDIAIDEL 200 

G + +++SG+GCRQ E +E K TW+D + ++ + + TRFD+AID+ 
Sbjct: 85 PDDNRG VLIEMSGQGCRQFESFLECRKKTWYD FFQDCMQQGGSFTRFDLAIDD- 137 

Query: 201 YLGKDRFJTOQFHLSDMISKTYRHELDFESLRTVII^IGGGSI^TSDMEEIEQNRQGISLYF 260 

+ F + +++ K + E R ++ GS + SD G ++YF 

Sbjct: 138 KKTYFSIPELLKKAQKGEC- ISRFRKSDF- -NGSFDLSD GITGGTTIYF 183 
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Query: 261 GSRQSEMYFNFYEKRYE I AKQEG I TVEEALE I FELWNRYE I RLSQSKANAAVDEFI SGVP 320 

GS++SE Y FYEK YE A++ I +EE + WNRYE+RL +A A+D + 
Sbjct: 184 GSKKSEAYLCFYEKNYEQAEKYNI PLEELGD WNRYELRLKMERAQVAIDALLKTKD 23 9 

Query: 321 IGEISRGLIVSKIDVYDGKNEY--GSFQADRKWQLMFGGVEPLKFVTKPEAYSIERTLRW 378 

+ 1+ +1 + + D . ++ W G V L KP+ +++ W 

Sbjct: 240 LTLIAMQIINNYVRFVDADENITREHWKTSLFWSDFIGDVGRLPliYVKPQKDFYQKSRNW 299 

Query: 379 LSDSVSPSLAMIREYDMIVDGDYLQTItNSGEVNERGEKILDSIKASL 426 

L +S +P++ M-t- E D + L ++ E+ +-h +K+LD A + 
Sbjct: 300 LRHSCAPTMKMULEADEHLGKTDLSDMIAFJffilADKHKKMLD'vTYMADV 347 

No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8603> and protein <SEQ ID 8604> were also identified. Analysis of thi 
protein sequence reveals a RGD motif at residues 131-133. 

The protein has homology with the following sequences in the databases: 

29.4/54.5% over 342aa 

Bacillus subtilis 

EGAD | 108511 | hypothetical protein Insert characterized OMNI |NT01BS0566 conserved 
hypothetical protein Insert characterized 

GP| 1881297 | dbj |BAA19324.l| |AB001488 SIMILAR TO ORF20 OF ENTEROCOCCDS FAECALIS TRANSPOSON 
TN916 . Insert characterized 

GP|2632787|enib|CAB12294.l| |Z99106 similar to transposon protein Insert characterized 

PIR|G69774 |G69774 transposon-related protein homolog ydcR - Insert characterized 

ORP0010K205 - 1581 of 1887) 

EGAD | 108511 |BS0487 (6 - 348 of 352) hypothetical protein {Bacillus subtilis} OMNI |NT01BS0566 
conserved hypothetical protein GP| 1881297 |dbj |BAA19324.l| |AB001488 SIMILAR TO ORF20 OF 
ENTEROCOCCUS FAECALIS TRANSPOSON TN916 . {Bacillus subtilis} 

GP| 2632787 |etnb(CAB12294.l| |Z99106 similar to transposon protein {Bacillus subtilis] 
PIR|G69774|G69774 transposon-related protein homolog ydcR - Bacillus subtilis 
IrMatch =9.7 

%Identity = 29.3 %Similarity =54.4 

Matches = 103 Mismatches = 146 Conservative Sub.s = 88 

153 183 213 243 273 489 519 

PFSNRGVRNEKFRILTPKNLYVSRVFR EQGKRKLTLEKYQEIKSHFG 

=11 =1111 



YLV- -ENDS--SRLQVMIDYVRITLKDVRDLEFFCPJJFLHC^FI<EFQPF3SKLKNYNHLWKRGDIWIFDFADKHETGNFQ 

=1 = l = = I I hl|:|==:| l== II = =1 = I == I =1 I I 

VIWEKraaVESPLVSMVDYIRVSFK-THDVDRIIEEVLHLSKDFMTEKQSGFYGYVGTYELDYIKVFYSAPDDNRG--- 



ITVQLSGRGCRQLELLMETEKFTWHDWLSYLRNSYRDDMNVTRFDIAIDELYLGKDRENEQFHLSDMISKYYRHELDFES 
= = = = l|:||ll = ! = = l ! 11 = 1 = = = = = = 1111 = 111 1= I -• = = = I = I 

VLIEMSGQGCRQFESFLECRKKTWYD- - -FFQDCMQQGGSFTRFDLAID DK- KTYFSI PELLKKAQKGEC - ISR 

100 110 120 130 140 150 

1017 1047 1077 1107 1137 1167 1197 1227 

LRTWNYIGGGSIOTSDMEEIEC^QGISLYFGSRQSEKYFNFYEKRYEIAKQEGITTO 

:| :: ll = = = ll I ==llli==ll 1= Mil II l = = I =11 = 11111 = 11 =1 

FRKSDF--NGSFDLSD GITGGTTIYFGSKKSEAYLCFYE-KNYEQAEKYNIPLEE LGDWNRYELRLKNERAQ 

170 180 190 200 210 220 

1257 1287 1341 1371 1401 1431 1461 

AAVDEFISGVPIGEISRGLIVSKIDVYDG-KN-EYGSFQADRKKQLMFG3\ r EPLKFVTKPEAYSIERTLRWLSDSVSPSL 
|:| :: s |: s| s s | «| s: I I | I = ||= === II =1 =1=: 

VAIDALLKTroLTLIAMQI INNYWFVDADEN I TREHKKTO ^ 
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1491 1521 1551 1581 1611 1641 1671 1701 

MREYDMIVDGDYLQTimSGEVNERGEKILDSIKASLGIL*EVSFvLYSNREFAYC^ 

|. | | < | :: |, =: :|:|| | = . 
KMVLEADEHLGKTDLSDMIAEAELM5KHKKMLDVYM2U3VADMVV 
320 330 340 350 

SEQ ID 8604 (GBS294) was expressed in E.coli as a HIs-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 167 (lane 6 & 7; MW 65kDa - thioredoxin fusion), in Figure 238 Qme 2; MW 
65kDa) and in Figure 40 (lane 6; MW 37kDa). It was also expressed in E. coli as a GST-fusion product. 
SDS-PAGE analysis of total cell extract is shown in Figure 47 (lane 3; MW 76kDa). 

Purified Thio-GBS294-His is shown in Figure 244, lane 2. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 578 

A DNA sequence (GBSx0618) was identified in S.agalactiae <SEQ ID 1807> which encodes the amino 
acid sequence <SEQ ID 1 808>. Analysis of this protein sequence reveals the following: 

Possible site: 40 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -3.61 Transmembrane 24 - 40 ( 20 - 41) 
INTEGRAL Likelihood = -1.97 Transmembrane 53 - 69 ( 52 - 72) 

Final Results 

bacterial membrane — Certainty=0. 2444 (Affirmative) < suco 
bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

= 40/473 (8%) 

RGIKVKPYMRYMSYYL-FSFLFILFLTPVGVYSYYYLDL LKMMDKMSM 1 56 

RG +++P + + ++ 4- L +FL VG++ + L DK+ + I 

RGKRIRPSGKDLVFHFTIASLLPVFLLWGLFHVKTIQQINWQDFNLSQADKIDIPYLII 63 





9 


Sbjct: 






57 


Sbjct: 


64 




110 


Sbjct: 


121 




169 


Sbjct: 


181 


Query: 


229 


Sbjct: 


240 




288 


Sbjct: 


296 



1 DPK AD ++DL + 



K Y Y GL +F + DEY AM L +E 



Query: 347 VGOTAIIAMQKPSADDLPTKIRSNMMHHISVGRLDIXSGYV^FGDENRNKEFRFIKYLAG 406 
G I+A Q+P A L IR +++GR+ + GY MMEG + + K+F F+K 
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Sbjct: 353 AGFFLII^CQRPDAKYLGDGIRDQFIIFRVALGRMSEMOTGMMFGSDVQ-KDF-FLK 

Query: 407 RROTGRGYSAVFGEVAREFYSPLLPKNFSFYDAFEKMRHENPFDPTENQEVS 459 
R+ GRGY V V EFY+PL+PK + F + +K++ T EV+ 

Sbjct: 407 - 



No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8605> and protein <SEQ ID 8606> were also identified. Analysis of this 
protein sequence reveals the following: 



Lipop Possible site: -1 Crend: 8 
McG: Discrim Score: -10.05 
GvH: Signal Score (-7.5): -3.42 

Possible site: 40 
»> Seems to have no N-terminal signal sequence 
ALOM program count: 2 value: -3.61 threshold: C 

INTEGRAL Likelihood = -3.61 Transmembrane 

INTEGRAL Likelihood = 

PERIPHERAL Likelihood = 1.01 
modified ALOM score: 1.22 

*** Reasoning Step: 3 



Transmembrane 



■ Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



■ Certainty=0. 2444 (Affirmative) < suco 

■ Certainty=0. 0000 (Not Clear) < suco 
• Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

29.9/52.7% over 456aa 

EGAD|l7035| hypothetical protein Insert characterized 
Gp|532554|gb|AAB60012.l| |U09422 ORF21 Insert characterized 

ORF00100(319 - 1677 of 2316) 

EGAD | 17035 | 17250 (2 - 458 of 461) hypothetical protein 
GP| 532554 | gb | AAB60012 . 1 | |U09422 ORF21 { Enterococcus faecalis} 
%Match =11.2 

%Identity =29.9 %Similarity =52.7 

Matches =135 Mismatches =199 Conservative Sub.s = 103 



207 



237 



267 



297 



327 



357 



Enterococcus faecalis 



{Enterococcus faecalis) 



FQWCLKFLHHHLRKRMLQIMETHQKMKHLKLINKR*RRGNLARL1PQYRGIKVKPYMRYMSYYL-FSFLFILFLTPVGV 
: || :::| : : ::: : |= :|| ||: 
MKQRGKRIRPSGKDLVFHFTIASLLPVFLLWGL 



Y SYYYLDL-LKMMDKMSMISVGTGLFIAFWSWYLTWFLQEAN-FLraKLDRLKRMSKFLYENGYVYEKR 

: |: | ||: : : : :| :: : : :| ::::| : || : || 

FHVKT I QQI NWQDFNL SQADKI D I PYL 1 1 S F S VAI L I CLLVAFVFKRVRYDTVKQLYHRQKLAKM I LENKW - YE SEQVKT 



KKSNKKTKTKYR-FPKAm7KQGI<^LSVSFE^GGKFQKKFKDIGGELEDTFFMDFMEKTDDPRFKIYKLAYSAFL 

II =|| I 111 = 1 = : = 1= IN == = =11 = = == =1 =111 

EGFFKDSAGRTKEKITYFPKMYYRLKNGLIQIRVEITLGICYQDQLLHLEKKLESGLYCELTDKELKDSYVEYTLLYDTIA 



873 903 933 963 993 1020 1050 1080 

SRITVTCDVIWNKDKGIKLM)GYYOTFINDPHLLVAGGTGGGKTVLLRSI^ 

III- I = II = = 11 =1 = = I I = I = I I I I I I I I I == = = = I I III II = = ll = 
SRISI-DEVE2UCDGKLRLMKNvT^WEYDKLPHMLIAGGIGGGKTYFILTLIEALLHTDSKLYILDPKNM LADLGSVM 
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1110 1140 1170 1200 1227 1257 1287 1317 

GRIAFEKADIIEKFEKAVTIMFARYDBWNEMKRLGHKDMKKFYDY-GLEPYFFVCDEYWALMSSLSyQEREIVDNaFTQ 
: : I I- I I I I I I » : : I I I II = I : = III 1 = 1 I =1 11 = 1 

ANVYYRKEDLLSCIETFYEEMMKR SEEMKQMKNYKTGKMYAYLGLPAHFLIFDEYVAFMEMLGTKENTAVMHKLKQ 

280 290 300 310 320 330 340 

1347 1377 1407 1437 1467 1497 1527 1557 

YILLGRQVGCNAIIAMQKPSADDLPTKIRSN^MHHI3VGRLDDGGY\^#^FGDEICK^IKEFRF1KYLAGRRVYGRGYSAVFG 
: = llll I |:| hi I I II :::|h = II Mil = 1 = 1 1 = 1 = 11 II I 

IVMLGRQAGFFLIIACQRPDAKYLGDGIRDQFNFRVALGRMSEMGYGMKFGSD-VQKDF-FLKRIKGR GYVDVGT 

350 370 380 390 400 410 



I 111=11=11 = 1 : = l = = I 11= 

SVISEFYTPLVPKGYDFLEEIKKLSNSRQSTQATCEAEVAGVD 
430 440 450 460 

SEQ ID 8606 (GBS216) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 42 (lane 3; MW 66.6kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 47 (lane 2; MW 91kDa). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 579 

A DNA sequence (GBSx0619) was identified in S.agalactiae <SEQ ID 1809> which encodes the amino 
acid sequence <SEQ ID 1810>. Analysis of this protein sequence reveals the following: 

z N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 .4095 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 580 

A DNA sequence (GBSx0620) was identified in S.agalactiae <SEQ ID 181 1> which encodes the amino 
acid sequence <SEQ ID 1812>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

• Final Results 

bacterial cytoplasm --- Certainty=0. 0944 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 



50 A related GBS nucleic acid sequence <SEQ ID 10219> which encodes amino acid sequence <SEQ ID 
10220> was also identified. 
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The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 581 

A DNA sequence (GBSx0621) was identified in S.agalactiae <SEQ ID 1813> which encodes the amino 
acid sequence <SEQ ID 1814>. Analysis of this protein sequence reveals the following: 
Possible site: 60 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -4.94 Transmembrane 810 - 826 ( 808 - 830) 

Final Results 

bacterial membrane Certainty=0 . 2975 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

!GB:D90354 surface protein antigen precursor [Strept... 

>GP:BAA14368 GB:D90354 surface protein antigen precursor 
[Streptococcus sobrinus] 
Identities = 151/408 (37%), Positives = 219/408 (53%), Gaps = 27/408 (S%) 

Query: 451 PSKAVIDEAGQSWGKTVLPNAEIiNYVAKQDPSQYKGMTASQGKIAKNFVFIDDYKDDAL 510 

P K +E G ++GK+VL Y D QYKG +++ I K F +4DDY ++AL 

Sbjct: 1162 PHKVNKNENGWIDGKSVIiAGTTNYYELTVroLDQYKGDKSAKETIQKGFFYVDDYPEEAL 1221 

Query: 511 IX3KSMKVNSIKASDGTDVSQL~LEMRHVLSTDTLDEKLQTLIKEAGISPVGEF^1WTA1<D 569 

D ++ + IK +D + + + S + +Q ++K+A I+P G F ++TA D 

Sbjct: 1222 D---LRTDLIKLTDANGKAVTGVSVADYASLEAAPAAVQDMLKKANITPKGAFQVFTADD 1278 

Query: 570 PQAFYKAYVQKGLDVTYNLSFKVKKEFTK- -GQIQNGVAQIDFGNGYTGNIWNDLTTPE 627 

PQAFY AYV G D+T VK E K G +N QIDFGNGY NIV+N++ 

Sbjct: 1279 PQAFYDAYVVTGTDLTIVTPMTVKAEMGKIGGSYENKAYQIDFGNGYESNIVINNVPQIN 1338 

Query: 628 IHKDV LDKEDGKSINNGTVKLGDEVTYKLEGWVVPTGRSYDLFEYKFVDQLQRTPDL 684 

KDV +D D +++ T+ L Y+L G ++P + +LFEY F D +T D 

Sbjct: 1339 PEKDVTLTMDPADSTNVDGQTIALNQVFKYRLIGGIIPADHAEELFEYSFSDDYDQTGDQ 1398 

Query: 685 YLRD-KWAKTOOTLKDGWIKKGTNLGEYTETVYNKKTGLYELVFKKDFLEKVARSSEF 743 

Y K AKVD+TLKDGT+IK GT+L YTE ++ G + FK+DFL V+ S F 
Sbjct: 1399 YTGQYKAFAKVDLTLKDGTIIKAGTDLTSYTEAQVDEANGQIWTFKEDFLRSVSVDSAF 1458 

Query: 744 GADDFVWKRIKAGDVYNTADFFINGNKVKTETVVTHTPE--KPKPVEPQ 791 

A+ ++ +KRI G NT +NG + TV T TPE +P PV+P+ 
Sbjct: 1459 QAEWLQMKRIAVGTFANTYVNTVNGITYSSNTVRTSTPEPKQPSPVnPKTTTTWFQPR 1518 

Query: 792 - -KATPKAPAKG- -LPQTGEASVAPLTALGAIILSA-IGLAGFKKRKE 834 

KA AP G LP TG++S A L LG + L+A L G +++++ 
Sbjct: 1519 QGKAYQPAPPAGAQLPATGDSSNAYLPLLGLVSLTAGFSLLGLRRKQD 1566 
Identities = 75/242 (30%), Positives = 120/242 (48%), Gaps = 33/242 (13%) 

Query: 11 SADQVTTQATTQTVTQNQAETVTSTQLDKAVATAKPCAAVAVTTTAAVNHATTTDAQADLA 70 

S+ T+QA T + V++++LD+A +A++A VV+AVNT +DA 

Sbjct: 73 SSQAETSQAQAGQKTGAMSVDVSTSELDEAAI<SAQEAGVTVSQDATVNKGTVETS--DEA 130 

Query: 71 NQTQT-VKDVTAKAQANTQAIKDATAKNAKIEAENKAESQRVSQLNAQTKAKID AEN 126 

NQ +T +KD +K A4 1+ T + A N+AE4 R++Q NA KA+ + AN 

Sbjct: 131 NQKETE I KDDYSKQAAD IQKTTEDYKAAVAANQAEMRITQENAAKKAQYEQDLAAN 187 
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Query: 127 KDAQAKADATNAQLQKDYQAKLAKI KSVEAYNAGVRQRNKDAQA KA 172 

K + NAQ + DY+AKLA+ + A V+Q N D4-QA + 
Sbjct: 188 KAEvERITNENAQAKADYEAKIiAQYQKDIA AVQQMMDSQAAYAAAKEAYDKELARV 244 

Query: 173 DATNAQLQKD YQAKLA LYNQALKAKAEADKQSINNVAFDIKAQ AKGVDNAEYG 225 

A NA +K+Y+ LA N+ +KA+ A +Q D +A+ K + A+ G 

Sbjct: 245 QAANAAAKKEYEEAIAANTTKNEQIKAENAAIQQRNAQAKADYEAK^ 304 

Query: 226 NS 227 



Query: 2 ITTLQTSQVSADQVTTQATTQTVTQNQAETVTSTQLDKAVATAK KAAVA 5 0 

+ +Q + +A + +A T+N+ + + + A AK K A 

Sbjct: 241 nARVQAANAAAKKEYEEALAANTTKNEQ I KAENAAI QQRNAQAKADYEAKLAQYEKDLAA 300 

Query: 51 OTTTAAVNHATTTDAQADLANQTQTVKDVTTAKA-QANTQAIKDATAENAKIDAENKAESQ 109 

+ ANA +A + V+ A A QA QA+ TA+NA+I AEN+A Q 

Sbjct: 301 AQSGMATNEADYQAKKAAYEQELARVQAAt^AA^ 360 

Query: 110 RVSQLNAQTKAKIE1AENKDAQAKADATNAQLQKDYQAKLA KI KSVEAYNAGVRQRN 165 

R +Q A 4AK+- KD A A + NA + DYQ KLA ++ V+A MA +Q 
Sbjct: 361 RI^QAKARyEAKIAQYQKDL-AAAQSGNAANEADYQEKIiAAYEKEIARVQAANAAAKQEY 419 

Query: 166 KDAQAKADATNAQL 

+ +A+A NA++ 

Sbjct: 420 EQKVQEANAKNAEITEANRAIRERNAKAKTDYELKLSKYQEEL 4 62 
Identities = 75/243 (30%) , Positives = 101/243 (40%) , Gaps = 56/243 (23%) 

8 SQVSAD-QWTQATTQTVTQNQAETVTSTQI^KAVATAKKAAVAVTTTAAVHHATTTDAQ 66 

S+ +AD Q TT+ V NQAET TQ + A A+ A V T +AQ 

142 SKQAADIQKTTEDYKAAVAAHQAETDRITQ-ENAAKKAQYEQDIAANKAEVERITNENAQ 200 

67 ADL---ANQTQTVKDVTAKAQANT - QAIK 91 

A A Q KD+ A QAN +A+ 

201 AKADYEAKIAQYQKDLAAVQQAtM3SQAAYAAAKI!AYDKEIARVQAANAAAKlffiYEEAIA 260 

92 DATAENAKIDAENKAESQRVSQLNAQTKAKIDAENKDAQAKADATKAQLQKDYQAKIiA-- 149 

T +N +1 AEN A QR +Q A +AK+ KD A A + KA + DYQAK A 
261 ANTTKIffiQIKAENAAIQQRNAQAKADYEAKLAQYEKDL-AAAQSGNATNEADYQAKKAAY 319 



Sbjct 

Sbj. 



150 --KIKSVEAYNAGVRQRNKDAQAKADATHAQL QKDYQAKLALYNQA 193 

++ v+A NA +Q + A A A NAQ+ + +Y+AKLA Y + 

320 EQELARVQAANAAAKQAYEQALAANTAKNAQITAENEAI QQRNAQAKANYEAKLAQYQKD 379 



194 LKA 196 

L A 
380 LAA 382 



There is also homology to SEQ ID 598. 

SEQ ID 1814 (GBS191) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 176 (lane 2; MW 91kDa). 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 



A DNA sequence (GBSx0622) was identified in S.agalactiae <SEQ ID 1815> which encodes the amino 
acid sequence <SEQ ID 1816>. This protein is predicted to be TnpA. Analysis of this protein sequence 
reveals the following: 
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o N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2935 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10221> which encodes amino acid sequence <SEQ ID 
10222> was also identified. 

A related GBS nucleic acid sequence <SEQ ID 9921> which encodes amino acid sequence <SEQ ID 9922> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC82523 GB.-AF027768 TnpA [Serratia marcescens] 
Identities = 168/385 (43%) , Positives = 232/385 (59%) , Gaps = 13/385 (3%) 

Query: 26 MMFKVEAVGPPERCPECGFD-KLYKHSSRNQLIMDIjPIRLKRVGLHLNRRRYKCRECGST 84 

M F+V+ V P C ECG + + R+ DLPI KRV L + RRRY CR C +T 
Sbjct: 1 MHFQVD-VPDPIACEECGVQGEFTOFGKRDVPYRDLPIHGKROTLWVVRRRYTCRACKTT 59 

Query: 85 IS VDEKRSMTKRLLKSIQEQSMSKTFVEVAESVGVDEKTIRNVFKDYVALKERE 138 

VD R MT RL + ++++S + + VA G+DEKT+R++F R 
Sbjct: 60 FRPQLPEMVDGFR-MTLRLHEYVEKESFNHPYTFVAAQTGLDEKTVRDIFNARAEFLGRW 118 

Query: 139 YQFETPKWLGIDEIHIIRRPRLVLTNIERRTIYDIKPNRNKETVIQRLSEISDRTYIEYV 198 

++FETP+ LGIDE+++ +R R +LTNIE RT+ D+ R ++ V h ++ DR +E V 
Sbjct: 119 HRFETPPJLGIDELYMKRYRCILTNIEERTLLDLLATRRQDvVTNYLMKLKDRQKVEIV 178 



Query: 259 I 

ILLKR H++++RE +++TW G PL AYE KE FY IWD + +W 

35 Sbjct: 239 ILLKRAHEVSDRERLIMETWTGAFPQLLAAYEHKERFYGIWDATTRLQAEAALDEWI-AT 297 

Query: 319 MSSNSKDAYKDLVRAVDNWHVEI FNYF- - DKRLTNAYTES INSI IRQVERMGRGYSFDAL 376 

+ K+ + DLVRAV NW E YF D +TNAYTESIN + + R GRGYSF+ + 
Sbjct: 298 I PKGQKE VWSDLVRAVGNWREETMTYFET0MPVTNAYTES INRLAKDKNREGRGYSFEVM 357 

40 

Query: 377 RAKILFNEKLHKKRKPRFNSSAFNK 401 

RA++L+ K HKK+ P S F K 
Sbjct: 358 RARMLYTTK-HKKKAPTAKVSPFYK 381 

45 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 583 

A DNA sequence (GBSx0623) was identified in S.agalactiae <SEQ ID 1817> which encodes the amino 
50 acid sequence <SEQ ID 1818>. This protein is predicted to be mercuric reductase. Analysis of this protein 
sequence reveals the following: 

N-terminal signal sequence 

• Final Results 

bacterial cytoplasm Certainty=0. 2115 (Affirmative) < suco 
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The protein has homology with the following sequences in the GENPEPT database: 



Query: 1 MNKFKVNISGMTCTGCEKHVESALEKIGAKNIESSYRRGEAVFELPDDIEVESAIKAIDE 60 

M K++V++ GMTCTGCE+HV ALE +GA IE +RRGEAVFELP+ + VE+A KAI + 
Sbjct: 1 MKKYRVDVQGMTCTGCEEHVAVALENMGATGIEVDFRRGEAVFELPNALGVKTAKKAISD 60 

Query: 61 ANYQAGE IEE VS S LENVAL INEDNYDLL I IGSGAAAFSSAI KAI E YGAKVGM I ERGTVGG 120 

A YQ G+ EEV S E V L NE +YD +IIGSG AAFSSAI +A+ + YGAKV MIERGT+GG 
Sbjct: 61 AKYQPGKAEEVQSQEMVQLGNEGDYDYIIIGSGGAAFSSAIEAVKYGAKVAMIERGTIGG 120 

Query: 121 TC^IGOTPSKTLLRAGEINHLSKDNPFIGLQTSAGEVDLASLITQKDKLVSELRKQKYM 180 

TCVHIGCVPSKTLLRAGEINHL+K+NPF+C-L TSAGEVDL& LI QK++LV+ELRN KY+ 
Sbjct: 121 TCVNIGCVPSKTLLRAGE INHLAKNNP FVGLHTSAGEVDLAPL I KQKNELVTELRNSKYV 180 

Query: 181 DLIDEYNFDLIKGEAKFVDASTVEVNGTKLSAKRFLIATGASPSLPQISGLEKMDYLTST 240 

DLID+Y F+LI+GEAKFVD TVEVNG +SAKRFLIATGASP+ P I GL ++DYLTST 
Sbjct: 181 DLIDDYGFELIEGEAKFVDEKTVEVNGAPISAKRFLIATGASPAKPNIPGLNEVDYLTST 240 

Query: 241 TLLELKKIPKRLTVIGSGYIGMELGQLFHHLGSEITLMQRSERLLKEYDPEISESVEKAL 300 

+LLELKK+PKRL VIGSGYIGMELGQLFH+LGSE-TL+QRSERLLKEYDPEI SESVEK+L 
Sbjct: 241 SLLELKKVPKRLWIGSGYIGMELGQLFHNLGSEVTLIQRSERLLKEYDPEISESVEKSL 300 

Query: 301 IEQGINLVKGATFERVEQSGEIKRVYVTVNGSREVIESDQLLVATGRKPNTDSLNLSAAG 360 

+EQGINLVKGAT+ER+EQ+G+IK+V+V VNG + +IE+DQLLVATGR PNT +LNL AAG 
Sbjct: 301 VEQGINLVKGATYERIEQNGDIKKVHVEWGKKRIIEADQLLVATGRTPNTATLNLRAAG 360 

Query: 361 VETGKffiffilLIMJFGQrSKEKIYAAGDVTLGPQFVYVAAYEGGIirDNAIGGLNKKIDLS 420 

VE G EI + I+D+ +T+N +IYAAGDVTLGPQFVYVAAY+GG+ NAIGGLNKK++L 
Sbjct: 361 VEIGSRGEI I IDDYSRTTNTRIYAAGDVTLGPQFVYVAAYQGGVAAPNAIGGLNKKLNLE 420 . 

Query: 421 WPAVTFTNPTVATVGLTEEQAKEKGYDVKTSVLPLGAVPRAIVNRETTGVFKLVADAET 480 

WP VTFT P +ATVGLTE +QAKE GY+VKTSVLPL AVPRA+VNRETTGVFKLVAD++T 
Sbjct: 421 WPGVTFTAPAIATVGLTEQQAKENGYEVKTSVLPLDAVPRALVNRETTGVFKLVADSKT 480 

Query: 481 LKVLGVHIVSENAGDVIYAASLAVKFGLTIEDLTETLAPYLTMAEGLKLVALTFDKDISK 540 

+KVLG H+V+ENAGDVIYAA+LAVKFGLT++D+ ETLAPYLTMAEGLKL ALTFDKDISK 
Sbjct: 481 MKVLGAHWAENAGDVIYAATLAVKFGLTVDDIRETIAPYLTMAEGLKLAALTFDKDISK 540 

Query: 541 LSCCAG 546 

LSCCAG 
Sbjct: 541 LSCCAG 546 

There is also homology to SEQ ID 1 820. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 584 

A DNA sequence (GBSx0624) was identified in S.agalactiae <SEQ ID 1821> which encodes the amino 
acid sequence <SEQ ID 1822>. This protein is predicted to be regulatory protein. Analysis of this protein 
sequence reveals the following: 

i N-terminal signal sequence 

■ Final Results 

bacterial cytoplasm Certainty=0. 4529 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 
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bacterial outside - 



-668- 

- Certainty=C 



0000 (Not Clear) ■ 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAA83973 GB:AF138877 mercury resistance operon negative 
regulator MerRl [Bacillus sp. RC607] 
Identities = 84/129 (65%) , Positives = 105/129 (81%) 

Query: 1 MIYRISEFADKCGVNKETIRyYERKNLLQEPHRTEAGYRIYSYDDVKRVGFIKRIQELGF 60 

M +RI E ADKCGVNKETIRYYER L+ EP RTE GYR+YS V R+ FIKR+QELGF 
Sbjct: 1 MKFRIGELADKCGVNKETIRYYERLGLIPEPERTEKGYRMYSQQTVDRLHFIKRMQELGF 60 

Query: 61 SLSEIYKLLGWDKDEVRCQDMFEFVSKKQKEVQKQIEDLKRIETMLDDLKQRCPDEKKL 120 

+L+EI KLLGWD4-DE 4C+DM+4F K +++Q++IEDLKRIE ML DLK+RCP+ K + 
Sbjct: 61 TLIffilDKLLGVVDRDEAKCRDMYDFTILKIEDIQRKIHDLKRIERMLMDLKERCPEHKDI 120 

Query: 121 HSCPIIETL 129 

+ CP1IETL 
Sbjct: 121 YECPIIETL 129 

There is also homology to SEQ ID 1712. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 



Example 585 

A DNA sequence (GBSx0625) was identified in S.agalactiae <SEQ ID 1823> which e 
25 acid sequence <SEQ ID 1824>. This protein is predicted to be Nramp metal ion transporter. Analysis of this 
protein sequence reveals the following: 



Possible site 














>>> Seems to I 


ave no N-terminal signal sequence 










INTEGRAL 


Likelihood =-13.85 


Transmembrane 


175 


191 


169 


201) 


INTEGRAL 


Likelihood =-11.94 


Transmembrane 




166 


132 


173) 


INTEGRAL 


Likelihood = -9.45 


Transmembrane 


491 


507 


481 


509) 


INTEGRAL 


Likelihood = -8.92 


Transmembrane 


375 


391 


374 




INTEGRAL 


Likelihood = -8.39 


Transmembrane 


72 


88 


69 


93) 


INTEGRAL 


Likelihood = -7.96 




280 


296 


274 


299) 


INTEGRAL 


Likelihood = -7.17 


Transmembrane 


413 


429 


411 


431) 


INTEGRAL 


Likelihood = -6.79 


Transmembrane 


327 


343 


322 


346) 


INTEGRAL 


Likelihood = -3.40 


Transmembrane 


444 


460 


443 


462) 


INTEGRAL 


Likelihood = -3.24 


Transmembrane 


132 


148 


132 


149) 


INTEGRAL 


Likelihood = -0.96 


Transmembrane 


115 


131 


114 


131) 



- Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



■ Certainty=0, 

- Certainty=0. 

- Certainty=0. 



654 0 (Affirmative; 
0000 (Not Clear) ■ 
0000 (Not Clear) ■ 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF83825 GB:AE003939 manganese transport protein [Xylella 
fastidiosa] 

Identities = 185/450 (41%), Positives = 278/450 (61%), Gaps = 29/450 (6%) 

Query: 16 ANGPSLEEINGTIEVPKDLSFFKTLLAYSGPGALVAVGYMDPGNWSTSITGGQNFQYLLI 75 

++ PSL E++ ++ V + + LLA+ GPG +V+VGYMDPGNW+T + GG F Y+L+ 
Sbjct: 35 SDSPSLGFJCtASVAVSRRGHWGFRLLAFLGPGYMVSVGYMDPGNWATGLAGGSRFGYMLL 94 

Query: 76 SIILMSSLIAMLLQYMSAKLGIWQMDIjAOAIRARTSKQLGIVLWILTELAIMATDIAEV 135 

S+IL+S+++A++LQ ++A+LGI + MDLAQA RAR S+ + LW++ ELAI+A D+AEV 
Sbjct: 95 SVILLSNVMAIVLQALAARLGIASDMDI^QACRARYSRGTTIiALWVVCELAIIACDLAEV 154 



Query: 136 IGGAIALYLLFHIPLAIAVFITVFDi/LLLLLLTKIC-FRKIEALWALILVIFLVFAYQVA 195 
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IG AIAL LL +P+ V IT DV+L+LLL GFR +EA V+AL+LVIF F Q+ 
Sbjct: 155 IGTAIALNLLLGVP I IWGWI TAVDWLVLLLMHRGFRALEAPVI ALLLVI FGCFWQI V 214 

Query: 196 LSHPIWTDIFKGLVPTSEAFSTSHTVNGQTPLSGALGIIGATVMPHNLYLHSSWQSRKL 255 
5 L+ P ++ G VP + V L A+GI+GATW4PHNLYLHSS+VQ+R 

Sbjct: 215 LAAPPLQEVLGGFVPRWQ WftDPQALYLAIGIVGATVMPHNLYLHSSIVQTRAy 268 

Query: 256 DHNNKKDIAR- -AIRFSTFDSNIQLTVAFFVNSLLLIMGVAVFKTGSVTDPSFFGr.FKAL 313 
+ + R A+R++ DS 4 L +A F+N+ +LI+ AVF D 
10 Sbjct: 269 P---RTPVGRRSALRWAVADSTLALMLALFINASILIIAAAVFHAQHHFD 315 



Query: 314 SNSTIMSNSILAHIASSGILSLLFAIALIASGQNSTITGTLTGQIIMEGFIHMKVPIWFR 373 

+ +LA + G+ + LFA ALLASG NST+T TL GQI+MEGF+ +++ W R 
Sbjct: 316 VEEIEQAYQLLAPVLGVGVAATLFATALLASGINSTVTATLAGQIVMEGFLRLRLRPWLR 375 

Query: 374 RIITRLlSVIPVMIOTLWSGRSTVEEHIAINmMNNSQVFLAFALPFSMLPLLIFTNSK 433 

R++TR ++++PV++ V + + T L+ SQV L+ LPF+++PLL + 

Sbjct: 376 RVLTRGLAIVPVIWVALYGEQGT GRLLLLSQVILSMQLPFAVIPLLRCVADR 428 

Query: 434 VEMDDDFKNTWIIKILGWLSVIGLIYLNMK 463 

M W++ ++ WL ++ LN+K 

Sbjct: 429 KVMGALVAPRWLM-WAWL1AGVIWLNVK 457 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Example 586 

A DNA sequence (GBSx0626) was identified in S.agalactiae <SEQ ID 1825> which encodes the amino 
acid sequence <SEQ ID 1826>. Analysis of this protein sequence reveals the following: 

30 Possible site: 20 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0 . 2590 (Affirmative) < suco 

35 bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Oertainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

40 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 



Example 587 

A DNA sequence (GBSx0627) was identified in S.agalactiae <SEQ ID 1827> which encodes the amino 
acid sequence <SEQ ID 1828>. Analysis of this protein sequence reveals the following: 

45 Possible site: 53 



50 



> Seems to have no N-terminal signal sequence 










INTEGRAL Likelihood = -9.82 Transmembrane 


212 


228 


( 204 


233 


INTEGRAL Likelihood = -8.39 Transmembrane 


98 


114 


( 94 


125 


INTEGRAL Likelihood = -7 . 22 Transmembrane 


132 


148 


( 122 


154 


INTEGRAL Likelihood = -6.42 Transmembrane 


159 


175 


( 155 


188 


INTEGRAL Likelihood = -4.78 Transmembrane 


54 


70 


( 51 


72 


INTEGRAL Likelihood = -2 . 97 Transmembrane 


IB 


34 


( 15 


36 



Final Results 
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bacterial membrane Certainty=0 .4927 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm -— Certainty=0 . 0000 (Not Clear) < suco 

5 The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB16051 GB:Z99124 yydj [Bacillus subtilis] 
Identities = 97/239 (40%) , Positives = 154/239 (63%) , Gaps = 3/239 (1%) 

Query: 4 LEFRKSIRGRTLFYIISTVALTYVLGYILPVGIDKIRHLTLGEFYFSTYTVFTQFGFLIF 63 
10 LEF+KSI + + + + ++LGY L VGIDK+ ++T F+FS+YTV TQFG ++F 

Sbjct: 3 LEFKKSISNKVIIILGAMFVFLFLLGYFLLVGIDKVSNVTPEMFFFSSYTVATQFGLMLF 62 

Query: 64 GFVIVYFFNKDYSDKCILYHYFSGYHLTKYFYTKLLVLFSEFFIAIIVCNILASLLWGYS 123 
FVI +F N++YS+K IL++ G ++ +FY K+ VLF E F I + ++ SL++ + 
15 Sbjct: 63 SFVIAFFINREYSNKNILFYKLIGENIYTFFYKK1AVLFLECFAFITLGLLIISLMY-HD 121 

Query: 124 LFYFLTTTILFSLWLQYLLWSTISILFSNMLVSIGVTIFYWITSIILVAIGG-IFKVS 182 

+F LFS V+LQY+L++ TIS+L N+L+SIGV+I YW+TS+ILVAI F 

Sbjct: 122 FSHFALLLFLFSAVILQYILIIGTISVLCPNILISIGVSIVYWMTSVILVAISNKTFGFI 181 

20 

Query: 183 AIFnASNSLYKIIGK-LFSHPMTIDLTDFFIIVPYMICLSVISFLIVCLSNRRWLLNGM^ 240 

A F+A N++Y I + L S MT+ D 1+ Y++ + +1+ +++ S RW+ G+" 
Sbjct: 182 APFEAGNTMYPRIERVLQSDNMTLGSNDVLFIILYLVSIIIINAIVLRFSKTRWIKMGL 240 

25 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 588 

A DNA sequence (GBSx0628) was identified in S.agalactiae <SEQ ID 1829> which encodes the amino 
30 acid sequence <SEQ ID 1830>. This protein is predicted to be antibiotic epidermin immunity protein F. 
Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 2901 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB16052 GB:Z99124 similar to ABC transporter (ATP-binding 
protein) [Bacillus subtilis] 
Identities = 100/209 (47%), Positives = 150/209 (70%), Gaps = 4/209 (1%) 

MFINNYTLKIGNRILLENTNLDFEEGEINHLLGRNGSGKSQLAKDFI INRGNYFSNDI YE 6 0 
M I NYTLK+ + LL++T+L F G+INH++G+NG GKSQLAKDF++N DI + 

MNIANYTLKVKGKTLLQDTDLHFSSGKINHWGKNGVGKSQIAKDFLIiNNSKRIGRDIRQ 6 0 

Query: 61 DTLIISSYSNLPSDVT INDLEKTIPWKLSKEIYQIJjNINQISKTVKLKQIiSDGQKQ 116 

+ +ISS SN+P+DV+ ++ L + K-l- +1 LLN++ I V +K LSDGQKQ 
Sbjct: 61 NVSLISSSSNIPNDVSKDFLLHFLSKKFDAKMIDKIAYLLNLDNIDGKVLIKNLSDGQKQ 120 

117 KVKLL VLLSLDKHI I ILDEITNALDKKSVDEINVFLQNYIQYYPEKI I INISHDINNIRS 176 

K+KLL L DK+II+LDEITN+LDKK+V EI+ FL YIQ PEKIIINI+HD++++++ 
121 KLBCLLSFLLEDKNIIVIiDEITNSLDECKTVIEIHGFLNKYIQENPEKIIINITHDLSDLKA 180 

177 LKGNYFLIDNQKICKVDTLDDAISWYLGE 205 
++G+Y++ ++Q+I + ++D I Y+ E 
L IEGDYYIFNHQEIQQYHSVDKLIEVYINE 209 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 1831> which encodes the amino acid 
sequence <SEQ ID 1832>. Analysis of this protein sequence reveals the following: 

d N- terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 2760 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 49/174 (28%) , Positives = 82/174 (46%) , Gaps = 27/174 (15%) 

Query: 3 INNYTLKIGNRILLENTNLDFEEGEINHLLGRNGSGKSQLRK DFIINRGN 52 

IN G R 4-L N N++ +G++ L+G NG+GKS + K II G 

Sbjct: 23 IQNLKKSYGKRTILNNvMMNI PKGKVYABIGPNGAGKSTIMKILTGLVSKTSGS 1 1 FEGR 82 

Query: 53 YFS NDIYEDTLI ISSYSNLPSDVTINDL - ERTI PWKLSKEI YQLLNINQI 101 

+S I E+ + +S+Y N+ T+ + E TI L+K + + I 

Sbjct: 83 EWSRRDLRKIGSIIEEPPLYKNLSAYDNMKWTTMLGVSESTILPLLNK VGLGNI 137 

Query: 102 SKTVKLKQLSDGQKQKVKLLVLLSLDKHI 1 1 LDEITNALDKKS VDE INVFLQNY 155 

K +KQ S G KQ++ + + L ++ILDE TN LD + E+ ++++ 

Sbjct: 138 DKR-PVKQFSLGMKQRLGIAISLINSPKLLILDEPTNGLDPIGIQELREIIESF 190 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 589 

A DNA sequence (GBSx0629) was identified in S.agalactiae <SEQ ID 1833> which encodes the amino 
acid sequence <SEQ ID 1834>. This protein is predicted to be aminoglycoside 6-adenylyltransferase. 
Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1780 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA29839 GB.-X06627 ORF (str) [Staphylococcus aureus] 
Identities = 91/289 (31%) , Positives = 146/289 (50%) , Gaps = 14/289 (4%) 

MRDEQSIYmVLNIANQDKRIEAVIjI^GSRANPNVPKDDFQDYDIVFV*ENFISDIISDTN 60 
MR E+EI NLV A Q ++ + L GSR N N+ KD FQDYD F + IE + + 
MRTEKEILNLVSEFAYQRSNVKIIALEGSRTNEMIKKDKFQDYDFAFFVSDIEYFTHEES 60 



Y VSTYV KG+ R +I+++ 





1 


Sbjct: 


1 






Sbjct: 


61 




116 


Sbjct: 


120 




176 



D S GK K I +Y+TDK+ LL F 
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Sbjct: 180 DHFNNILRPELLRMISWyiGENRGFD-FSIiGKNYKFINKYLTDKEFNMLLATPEMNGYRK 238 

Query: 234 IF^LRFLLDETNQMAKXISINRKIJSIIjNC2SEYQSAMKFMNIFI 1 SNSYQN 282 
+ + ++ KY S N+ L Y+K+F+ N+Y+N 

5 Sbjct: 239 TYQSFKLCC ELFKYYS-NKVSCLGNYNYP1IYEKNIENFIRNNYEN 282 

No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8607> and protein <SEQ ID 8608> were also identified. Analysis of thi 
protein sequence reveals the following: 

10 Lipop: Possible site: -1 Crend: 5 

McG: Discrim Score: -5.26 
GvH: Signal Score (-7.5): -6.14 

Possible site: 33 
>>> Seems to have no N-terminal signal sequence 
15 ALOM program count: 0 value: 6.10 threshold: 0.0 

PERIPHERAL Likelihood = 6.10 151 
modified ALOM score: -1.72 

*** Reasoning Step: 3 

20 

Final Results 

bacterial cytoplasm --- Certainty=0 . 1780 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

25 

The protein has homology with the following sequences in the databases: 

31.0/53.4% over 281aa 
Staphylococcus aureus 
EGAD 1 9462 | streptomycin resistance protein Insert characterized 
30 SP|P12055|STR_STAAU STREPTOMYCIN RESISTANCE PROTEIN. Insert characterized 

GPl46644|emb|CAA29839.l| |X06627 ORF (str) Insert characterized 
PIR|S00938 |S00938 str protein - plasmid pS194 Insert characterized 

ORF00399(301 - 1146 of 1452) 
35 EGAD | 9462 | 9267 (1 - 282 of 282) streptomycin resistance protein {Staphylococcus aureus} 

SP|P12055|STR_STAATJ STREPTOMYCIN RESISTANCE PROTEIN. GP | 45644 | emb | CAA29839 . 1 1 |X05627 ORF 
(str) {Staphylococcus aureus} PIR| S00938 1 S00938 str protein - Staphylococcus aureus plasmid 
pS194 

%Match =12.8 
40 %Identity =31.0 %Similarity = 53.4 

Matches = 87 Mismatches = 125 Conservative Sub.s = 63 

117 147 177 207 237 267 297 327 

**LMTY*H*TVENIWNHNQLLRKI *N*ILGGRKGMSMLI*VYDYMLREKYKGNIKVLEXTW*YKVK*EVAIMRDEQEIYN 
45 || |:|| | 

MRTEKEILN 

357 387 417 447 477 507 558 

LVLNIANQDKRIEAVLLNGSRANPNVPKDDFQDYDIVFVTNFIEDI I SDTNYHKKFGDILIMQKPNEFR- - -NKTEYNCF 
50 || M :: : | HI | |: || ||||| | : || : :: ||::| :||| : :| : 

LVSEFAYQRSNVKIIALEGSRTNENIKKDKFQDYDFAFFVSDIEYFTHEESWLSLFGELLFIQKPEDMELFPPDLDYG-Y 



AYLMQFQDLTRIDLRLIKPEFLEDYLDDA--FSKVLLDIOCNKYLBYNFERSSLYETKQLSEDEINKILNEIYOTSTYVVIC 
■■\--\ 1 = 1 II ■• I |: h : |:|:|| I I I I : : I I II : Hll! I 

SYIMYFKDGIKMDITLINLKDLNRYFSDSDGLVKILVDKDNLOTQEIVPDDSNY^ 

100 110 120 130 140 150 160 

822 852 882 912 942 966 996 1026 

GIARNDIIYSEFMISNPIKNAFIKLLKQKILIEKELDSLSFGKLDKDILQYITDKD--QLLKIFSNKSLKDIEANLRFLL 
1= I :|::: : I = .-::.« I = :| = I » I I I I = I = I I I : II I . : : : : 

GVFRREILFALDHFNNILRPELLRMISWYIGFNRGFD-FSLGKNYKFINKYLTDKEFNMLLATFEMNGYRKTYQSFKLCC 
180 190 200 210 220 230 240 
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1056 1086 1116 1146 1176 1206 1236 1266 

DETNQMAKYISINRKIi^QGEYQ 

5 ELFKYYSNKVS CLGNyNipNYELlENF™EN 

260 270 280 

SEQ ID 1834 (GBS46) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 11 (lane 6; MW 34.9kDa). It was also expressed in E.coli as a GST-fusion 
1 0 product. SDS-PAGE analysis of total cell extract is shown in Figure 1 6 (lane 3; MW 59.8kDa). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 590 

A DNA sequence (GBSx0630) was identified in S.agalactiae <SEQ ID 1835> which encodes the amino 
15 acid sequence <SEQ ID 1836>. Analysis of this protein sequence reveals the following: 

Possible site: 29 

»> Seems to have no N-terminal signal sequence 

Final Results 

20 bacterial cytoplasm Certainty=0 . 1179 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
25 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 591 

A DNA sequence (GBSx0631) was identified in S.agalactiae <SEQ ID 1837> which encodes the amino 
30 acid sequence <SEQ ID 1838>. Analysis of this protein sequence reveals the following: 

Possible site: 44 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.81 Transmembrane 177 - 193 ( 177 - 194) 
INTEGRAL Likelihood = -0.27 Transmembrane 129 - 145 ( 129 - 145) 

35 

Final Results 

bacterial membrane Certainty=0. 2125 (Affirmative) c suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

40 

A related GBS nucleic acid sequence <SEQ ID 8609> which encodes amino acid sequence <SEQ ID 861 0> 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 9 
McG: Discrim Score: -13.59 
45 GvH: Signal Score (-7.5): -4.49 

Possible site: 44 
>>> Seems to have no N-terminal signal sequence 
ALOM program count: 2 value: -2.81 threshold: 0.0 

INTEGRAL Likelihood = -2.81 Transmembrane 172 - 188 ( 172 - 189) 
50 INTEGRAL Likelihood = -0.27 Transmembrane 124 - 140 ( 124 - 140) 

PERIPHERAL Likelihood =8.01 30 
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modified ALOM score: 1.06 

*** Reasoning Step: 3 

5 Final Results 

bacterial membrane Certainty=0 . 2126 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

10 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 592 

15 A DNA sequence (GBSx0632) was identified in S.agalactiae <SEQ ID 1839> which encodes the amino 
acid sequence <SEQ ID 1840>. Analysis of this protein sequence reveals the following: 

Possible site: 27 

>» Seems to have an uncleavable N-term signal seq 

20 Final Results 

bacterial membrane Certainty=0 . DODO (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

25 A related GBS nucleic acid sequence <SEQ ID 10223> which encodes amino acid sequence <SEQ ID 
10224> was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB49414 GB:AJ248284 hypothetical protein [Pyrococcus abyssi] 
identities = 29/86 (33%) , Positives = 52/86 (59%) , Gaps = 4/86 (4%) 

30 

Query: 14 TYYILI^FE--EAHGYAIMQKVEEMSGGDVRIAAGTMYGAIENIiLKQKWIKSIPSD--D 69 

+Y ILL L E + HGYAI +++EE++ G + + G +Y ++ L K K ++ ++ 
Sbjct: 19 SYLILL I LNENEKLHGYAIRKRLEELTDGKLVPSEGALYS I LKMLKKYKLVEDYWAEVGG 78 

35 Query: 70 RRRKVYI ITETGKEIVELETNRLRKL 95 

R R+ Y ITE GKE+++ +R++ 
Sbjct: 79 RVRRYYQITELGKEVLDEIKEEIREI 104 

No corresponding DNA sequence was identified in S.pyogenes. 

40 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 593 

A DNA sequence (GBSx0633) was identified in S.agalactiae <SEQ ID 1841> which encodes the amino 
acid sequence <SEQ ID 1842>. Analysis of this protein sequence reveals the following: 

45 Possible site: 23 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 0510 (Affirmative) < suco 

50 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 
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A related GBS nucleic acid sequence <SEQ ID 10225> which encodes amino acid sequence <SEQ ID 
10226> was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF22299 GB:AF185571 putative N-acetyltransferase Camello 2 
[Homo sapiens] 

Identities = 32/110 (29%) , Positives = 54/110 (49%) , Gaps = 4/110 (3%) 

Query: 57 IKMAEQDDIFQIENYYQIffiKGQ-FWIALEl<IEKVVGSIALLRXDDKTAVLKKFFTYPKYRG 125 

+ +A + D+ I Y + G FW+A EKWG++ L +DD T K+ + 
Sbjct: 86 VDIALRTDMSDITKSYLSECGSCFWVAESEEKWGTVGALPVDDPTLREKRLQLFHLSVD 145 

Query: 126 NPW---LGRKLFERFMLFARASKFTRIVLDTPEKEKRSHFFYENQGFKQ 172 

N R + + L + FAR ++ +VLDT + + Y++ GFK+ 
Sbjct: 146 NEHRGQGIAKALVRTVLQFARDQGYSEWLDTSNIQLSAMGLYQSLGFKK 195 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 594 

A DNA sequence (GBSx0634) was identified in S.agalactiae <SEQ ID 1843> which encodes the amino 
acid sequence <SEQ ID 1 844>. Analysis of this protein sequence reveals the following: 

Possible site: 47 

»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood =-11.94 Transmembrane 159 - 175 ( 151 - 180) 

INTEGRAL Likelihood =-11.62 Transmembrane 231 - 247 ( 225 - 251) 

INTEGRAL Likelihood = -9.98 Transmembrane 182 - 198 ( 177 - 203) 

INTEGRAL Likelihood = -7.11 Transmembrane 118 - 134 ( 106 - 13S) 

INTEGRAL Likelihood = -1.49 Transmembrane 74 - 90 ( 74 - 93) 

Final Results 

bacterial membrane Certainty=0 . 5776 (Affirmative) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 
bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10227> which encodes amino acid sequence <SEQ ID 
10228> was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB15891 GB:Z99123 yxlG [Bacillus subtilis] 
Identities = 42/188 (22%) , Positives = 94/188 (49%) , Gaps = 4/188 (2%) 

Query: 1 MKS^VMLKKEWMEt>nmTYtWISILITCSIFGILGPLTALMMPDIMA--GILPKKLQGAI 58 

MK + +L+KEW+E ++ K+I + I I G+ PLT MP+I+A G LP ++ + 
Sbjct: 1 MKVMMALLQKEWLEGWKSGKLIWLPIAM'IIVGLTQPLTIYYMPEIIAHGGNLPDGMKISF 60 

Query: 59 PEPTYIDSYIQYFKNMNQLGLVILVFLFSSTLTQEFSKGTLINLVTKGLAKKVIILAKFI 118 

P+ + + N LG4 +++F ++ E ++G ++++ + I++K++ 

Sbjct: 61 TMPSGSEVWSTLSQFNTLGMALVIFSVMGSVKNERNQGvTALIMSRPVTAAHYIVSKWL 120 

Query: 119 VITLLWTVSYLLSWIHFSYTLYYFSNEGSHKLMVYGATWFIGILFI-SLILFFSVLFRK 177 

+ +++ +S+ + + Y F4 + + + ++FI + L S +FR 

Sbjct: 121 IQSVIGIMSFAAGYGLAYYYWLLFEDASFSRFAASLGLYALWVIFIVTAGLAGSTIFR- 179 

Query: 178 TLGGLLGC 185 

++G C 
Sbjct: 180 SVGAAAAC 187 



WO 02/34771 



-676- 



PCT/GB01/04789 



No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 595 

A DNA sequence (GBSx0635) was identified in S.agalactiae <SEQ ID 1845> which encodes the amino 
acid sequence <SEQ ID 1846>. This protein is predicted to be ABC transporter, ATP-binding protein. 
Analysis of this protein sequence reveals the following: 

Possible site: 14 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3431 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10229> which encodes amino acid sequence <SEQ ID 
1023O was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB12736 GB:Z99108 similar to ABC transporter (ATP-binding 
protein) (Bacillus subtilis] 
Identities = 105/299 (35%), Positives = 175/299 (58%), Gaps = 11/299 (3%) 



E++ VGL D 







Sbjct: 


5 




64 


Sbjct: 


65 




124 


Sbj Ct : 


125 






Sbjct: 






241 


Sbjct: 


244 



3 K+ TYS GM+QRLGLAQ L+HDPK++I DEPT+ LDP G ++I D + L E+ 



TVIFSTHILSDVEKICDHVLVLTKCGIYSLEELKGKKS3ENYSVRILIKVTKSEAKVLSH 24 0 

VI S+H+LS++E +CD + +L K + ++ +K + +EN + ++ SEA + + 
AVI VSSHLLSEMELMCDRIAILQKGKL I DIQNVKDENIDENDTYFFQVE - QPSEAATVLN 243 



There is also homology to SEQ ID 686. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens f 



Example 596 

A DNA sequence (GBSx0636) was identified in S.agalactiae <SEQ ID 1847> which encodes the amino 
acid sequence <SEQ ID 1848>. Analysis of this protein sequence reveals the following: 

Possible site: 34 

>» Seems to have no N- terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0. 4040 (Affirmative) < succ: 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB71491 GB:U53767 0RF6 [Bacillus pumilus] 

Identities = 39/134 (29%) , Positives = 71/134 (52%) , Gaps = 16/134 (11%) 



Query: 62 GYQKTLSDDQRNQLII03LKIICANVLSERDFFQEVKELSKQFPNDFKTLLIMINM--VLSN 119 

S +Q ++ KDL+ + 
Sbjct: 64 SPEQYSEEQKDLETRIE- - 

Query: 120 I 



There is also homology to SEQ ID 1740. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 597 

A DNA sequence (GBSx0637) was identified in S.agalactiae <SEQ ID 1849> which 
acid sequence <SEQ ID 1 850>. Analysis of this protein sequence reveals the following: 
Possible site: 20 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-13.59 Transmembrane 152 - 168 ( 145 - 173) 
INTEGRAL Likelihood = -9.71 Transmembrane 7 - 23 ( 3-27) 
INTEGRAL Likelihood = -6.95 Transmembrane 125 - 141 ( 122 - 146) 
INTEGRAL Likelihood = -4.51 Transmembrane 85 - 101 ( 83 - 102) 
INTEGRAL Likelihood = -3.35 Transmembrane 55 - 71 ( 54 - 75) 

Final Results 

bacterial membrane Certainty=0. 6434 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 

>GP : CAA79986 GB:Z21972 0RF2 [Bacillus megaterium] 
Identities = 51/186 (27%) , Positives = 106/186 (56%) , Gaps = 5/186 (2%) 

Query: 5 SFFQCVILLVSFLVLTLAVKSQSDMISYLDNITSAFFQSIRNPDLTNLMTIISTWSPLT 64 

+F V+ L+ F + + S ++ + + +++ S Q +P LT++M + + S + 
Sbjct: 10 AFIISVLSLIGFSFMAFTI-SANEYLKFDEDVIS-LVOGWESPLLTDIMKFFTYIGSTAS 67 

Query: 65 TSLI ALVI LGYQY - FLNQRI AVVWjFM- LFFGTNALALLLKDI I ARHRP - MNQLVFDSGYS 121 

+++LVIL + Y L R+ + LF + G+ L L++K R RP +++L+ GYS 
Sbjct: 68 LIILSLVILFFLYRILKHRLELVLFTAVMVGSPLENIjMVKLFFQRARPDLHRLIDIGGYS 127 

Query: 122 FPSGHTISAFLLMILVLWARQRLRRVLSQWFVIFALVILASVIFSRLYLENHFLTDIL 181 

FPSGH ++AF L ++ + + + ++++ ++F+++++ S+ SR+YL H+ +DI+ 
Sbjct: 128 FPSGHAMNAFSLYGILTFLLWRHITARWARILLILFSMLMILSIGISRIYLGVHYPSDII 187 

Query: 182 GSLLLG 187 

L G 

Sbjct: 188 AGYLAG 193 
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There is also homology to SEQ ID 1852. 

A related GBS gene <SEQ ID 861 1> and protein <SEQ ID 8612> were also identified. Analysis of this 
protein sequence reveals the following: 

5 Lipop: Possible site: -1 Crend: 3 

McG: Discrim Score: 11.91 
GvH: Signal Score (-7.5): -4.6 
Possible site: 20 



15 



»> Seems to have an uncleavable N 


terra signal seq 








ALOM program 


count: 5 value: -13 


59 threshold: 0.0 








INTEGRAL 


Likelihood =-13.59 


Transmembrane 152 


168 


145 


173 


INTEGRAL 


Likelihood = -9.71 


Transmembrane 7 


23 


3 


27 


INTEGRAL 


Likelihood = -6.95 


Transmembrane 125 


141 


122 


146 


INTEGRAL 


Likelihood = -4.51 


Transmembrane 85 


101 




102 


INTEGRAL 


Likelihood = -3.35 


Transmembrane 55 


71 


54 


75 


PERIPHERAL 


Likelihood = 1.16 


184 








modified ALOM 













25 



35 



*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0 . 6434 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the datahases: 

ORF01359(313 - 864 of 1212) 

EGAD | 16772 | 16959 (10 - 194 of 216) hypothetical protein {Bacillus megaterium} 
GP|28830l|emb|CAA79986.l| |Z21972 0RF2 {Bacillus megaterium} PIR| S32217 |S32217 hypothetical 
protein 2 - Bacillus megaterium 
SrMatch =9.5 

%Identity =28.2 %Similarity =60.1 

Matches = 53 Mismatches = 68 Conservative Sub.s = 60 

66 S6 126 156 186 216 246 276 

SFFIEFTHPFLIICWIHYSLRFKYIVAILLY**KFER*LIGKVRIWYFF*FVNSHI*T*KVSAYFKHFLNIIMNV*RFI 



306 336 366 396 426 456 486 516 

SLLK*GYWNKKSFFQCVILLVSFLVLTLAVKSQSDMISYLDNITSAFFQSIRNPDLTNLMTIISTWSPLTTSLIALVI 
40 :| |s |: | ::: | s: ::::::: | :| ||::| : : | : :::||| 

MKLKQQLTIAFIISVLSLIGFSFMAFTI-SANEYLKFDEDV-ISLVQGVIESPLLTDIMKFFTYIGSTASLIILSLVI 
10 20 30 40 50 60 70 



543 570 600 630 657 687 714 744 

45 LGYQY-FI^QRIAWLFM-LFFGTNALALLLKDIIARHRP-MNQLVFDSGYSFPSGHTISAFLLM-ILVLVVARQRLRRV 
1 = 1 M: : II = 1 = I l = = l 111-1= I I I I I I I I = = I I I II = = = 1= = 
LFFLYRILKHRLELVLFTAVMVGSPLIjNLMVKLFFQRARPDLHRLIDIG^YSFPSGHAMI^FSLYGILTFLLWRH-ITAR 
90 100 110 120 130 140 150 

50 774 804 834 864 894 924 954 984 

LSQWFVIFALVILASVIFSRLYLERHFLTDILGSLLLC-ASSYYGLSAIVSLKELQ*K**LPMNYKRAFLKGSFIIHYFS 

:::::::| : :::: |: | | : | | |: : | | : | | 
WARILLILFSMLMILSIGISRIYLGVHYPSDIIAGYLAC-GCKIAISIWFFQRYQDRRKNKDR 
170 180 190 200 210 

55 

Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
vaccines or diagnostics. 

Example 598 

A DNA sequence (GBSx0638) was identified in S.agalactiae <SEQ ID 1853> which encodes the amino 
60 acid sequence <SEQ ID 1854>. Analysis of this protein sequence reveals the following: 
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Possible site: 41 

>>> Seems to have no N-terminal signal sequence 



Final Results 

5 bacterial cytoplasm Certainty=0. 4288 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 

10 >GP:CAB15470 GB:Z99121 yvdC [Bacillus subtilis] 

Identities = 53/96 (55%) , Positives = 70/96 (72%) 

Query: 1 MDITDYQKWVSEFYKKRNWYQYNSFIRSNFLSEEVGELAQAIRKYEIGRDRPDETEQTDL 60 
M + D +KW+ EFY+KR W +Y FIR FL EE GELA+A+R YEIGRDRPDE E + 
15 Sbjct: 1 MQLADAEKWMKEFYEKRGWTEYGPFIRVGFLMEFAGELARAVRAYEIGRDRPDEKESSRA 60 

Query: 61 ENUTOIKEELGDVLDNIFILADQYNISLEEIISAHR 96 

E ++ EE+GDV+ NI I LAD Y +SLE+++ AH+ 
Sbjct: 61 EQKQELIEEMGDVIGNIAILADMYGVSLEDVMKAHQ 96 

20 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 



Example 599 

25 A DNA sequence (GBSx0639) was identified in S.agalactiae <SEQ ID 1855> which encodes the amino 
acid sequence <SEQ ID 1856>. Analysis of this protein sequence reveals the following: 

Possible site: 54 

>>> Seems to have no N-terminal signal sequence 

30 Final Results 

bacterial cytoplasm — Certainty=0. 0635 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



35 The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB06803 GB:AP001517 unknown conserved protein [Bacillus halodurans] 
Identities = 83/186 (44%) , Positives = 117/186 (62%) 



Sbjct: 1 MKIAVFCGSSNGASDWKEGARQLGKELARRGITLVYGGASVGIMGAVADSVLEAGGEVI 60 

Query: 61 GVIPTFLRDREIAHENLSELIIVNNMPERKAKMMLLGDAFIALPGGPGTLEEISEVISWS 120 

GV+P FL + EI+H +L++LI+V M ERKAKM L D F+ALPGGPGTLEE E+ +W+ 
Sbjct: 61 GVMPRFLEEPEISHPHLTK1IVVETMHERKA1CMAELADGFLALPGGPGTLEEFFEIFTWA 120 

Query: 121 RIGQNDNPCILYNVNGYFNDLKNMFDffiWGEGFLSLEDRENVLFSDDITEIEDFITNYKV 180 

+IG + PC L N+N YF+ L + HM E FL + R L D + D + Y+ 
Sbjct: 121 QIGLHQKPCGLIiNINHYFDPLVTLLHHMSNEQFLHEKYRSt'LALVHTDPILLLDQFSTYEP 180 

Query: 181 PSTRQY 186 

P+ + Y 
Sbjct: 181 PTVKAY 186 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 600 

A DNA sequence (GBSx0640) was identified in S.agalactiae <SEQ ID 1857> which encodes the amino 
acid sequence <SEQ ID 1858>. Analysis of this protein sequence reveals the following: 



10 



Possible site 














>>> Seems to have a cleavable N-te 


-m signal seq. 










INTEGRAL 


Likelihood = -7.86 


Transmembrane 


222 


238 


214 


239) 


INTEGRAL 


Likelihood = -6.69 


Transmembrane 


39 


55 




58) 


INTEGRAL 


Likelihood = -4.25 


Transmembrane 


266 


282 


266 


284) 


INTEGRAL 


Likelihood = -1.28 


Transmembrane 


166 


182 


166 


182) 


INTEGRAL 


Likelihood = -1.01 


Transmembrane 


190 


206 


190 




INTEGRAL 


Likelihood = -0.96 


Transmembrane 


70 


86 


70 


86) 



Final Results 

bacterial membrane Certainty=0 .4142 (Affirmative) < suco 

15 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB12420 GB:Z99107 ydiL [Bacillus subtilis] 
20 Identities = 40/132 (30%), Positives - 63/132 (47%), Gaps = 8/132 (6%) 



Query: 107 ESQNYDATFNI LMISYSVWGPFFEEVLYRGIVLNLL-SKYGKWFAIITSGILFG 160 

ES+N A ++ LMI S +VGP EE+++R 1+ L K +FA + S ++FG 

Sbjct: 114 ESENTQAILDVIQAVPLMIIVSSIVGPILEEIIFRKIIFGALYEKTNFFFAGLISSVIFG 173 

Query: 161 LMHQDISQLLTTSIAGIIMGFI-AYHYSFCTALLLHICNNFIVEIFTQLSTVNELYGTYF 219 

++H D+ LL + G F+ A V + H+ N V + QL V 

Sbjct: 174 IVHADLKHLLLYTAMGFTFAFLYARTKRIMVPIFAHLMMNTFV-VIMQLEPVRNYLEQQS 232 

Query: 220 ENILLILAILFI 231 

+ LI+ LF+ 
Sbjct: 233 TQMQLIIGGLFL 244 



No corresponding DNA sequence was identified in S.pyogenes. 

35 A related GBS gene <SEQ ID 8613> and protein <SEQ ID 8614> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 6 
McG: Discrim Score: 12.52 
GvH: Signal Score (-7.5): -1.74 
40 Possible site: 19 

>» Seems to have a cleavable N-term signal seq. 
ALOM program count: 2 value: -6. 69 threshold: 0.0 

INTEGRAL Likelihood = -6.69 Transmembrane 39 - 55 ( 36 - 58) 
INTEGRAL Likelihood = -0.96 Transmembrane 70 - 86 ( 70 - 86) 
45 PERIPHERAL Likelihood = 4.56 21 

modified ALOM score: 1.84 



k Reasoning Step: 3 

--- Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



- Certainty=0. 3675 (Affirmative) < £ 

- Certainty=0.0000(Not Clear) < sue 
■- Certainty=0. 0000 (Not Clear) < sue 



55 The protein has homology with the following sequences in the databases: 

Query: 10 LIGLILLAQAIVLSLATTLFAEILQNDVWIGIASTLIALLIPCF 53 

L+ L LL ++++LS++ +L +W+ +A+ L+A ++ CF 

Sbjct: 21 LLCLCLLWSLLLSVSLYSALIIJjvTiILWvTV7ATPIiLAFVVSCF 64 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 601 

A DNA sequence (GBSx0641) was identified in S.agalactiae <SEQ ID 1859> which encodes the amino 
5 acid sequence <SEQ ID 1860>. This protein is predicted to be capa protein. Analysis of this protein 
sequence reveals the following: 

Possible site: 50 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-13.80 Transmembrane 27 - 43 ( 22 - 50) 

10 

Final Results 

bacterial membrane --- Certainty=0. 6519 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=Q. 0000 (Not Clear) < suco 

15 

A related GBS nucleic acid sequence <SEQ ID 9385> which encodes amino acid sequence <SEQ ID 9386> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF13661 GB:AF188935 pX02-56 [Bacillus anthracis] 
20 Identities = 68/224 (30%) , Positives = 118/224 (52%) , Gaps = 10/224 (4%) 

Query: 95 FKEVKSWIESADLAIGDYEGTISSE YPLAGYPL-FNAPNEIATTMKETGYDVVDLA 149 

F+ V +++++D G++E + E Y A + +A E +KE G+ V++LA 
Sbjct: 87 FRHVSPYLKNSDWSGNFEHPVLLEDKJCNYQKADKNIHLSAKEETVKAVKEAGFTVLNLA 146 

25 

Query: 150 HNHILDSQIAGAINIVKTENRIX3IiDTIGVYLKDRNKEDILIKHVNGIKIAILGYSYGY-N 208 

+NH+ D G +T+K F LD +G ++ ++I+ ++VNG+++A LG++ + 

Sbjct: 147 NNHMTDYGAKGTKDTIKAFKIADLDWGAGENFKDVKNIVYQNVNGVRVATLGFTDAFVA 206 

30 Query: 209 GMEANVSKSDYEKHMSDLDTKKIKQDIKKAEKEADITIVMPQMGIEYQKKPTTEQVMLYH 268 

G A + D+ K+I + + AD+ +V G EY KP+ Q L 

Sbjct: 207 GAIATKEQPGSLSMNPDVLLKQISKAIODPKKGNADLvvvNTHWGEEYDNKPSPRQEALAK 266 

Query: 269 SMIKWGADI I FGGHPHWEPSEVIKKDGQKKFI IYSMGNFI SNQ 312 
35 +M+ GADII G HPHV++ +V K+ I YS+GNF+ +Q 

Sbjct: 267 AMVDAGADIIVGHHPHVLQSFDVYKQG 1 IFYSLGNFVFDQ 306 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1861> which encodes the amino acid 
sequence <SEQ ID 1862>. Analysis of this protein sequence reveals the following: 

40 Possible site: 45 

»> Seems to have no N-terminal signal sequence 

Likelihood =-12.05 Transmembrane 44 - 60 ( 40 - 68) 



Final Results 

45 bacterial membrane Certainty=0 . 5819 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

A related sequence was also identified in GAS <SEQ ID 9119> which encodes the amino acid sequence 
50 <SEQ ID 9120>. Analysis of this protein sequence reveals the following: 

Possible cleavage site: 31 
>» Seems to have no N-terminal signal sequence 

Final Results 

55 bacterial membrane Cert< 

bacterial outside Cert; 
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bacterial cytoplasm Certainty= 0.000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 232/334 (69%), Positives = 273/334 (81%), Gaps = 4/334 (1%) 

5 

Query: 24 YQKTLI FCVAVI IAI FILGLSKDLAQSKGQKVANNNT VKTARVVANGDILLHDVLY 79 

Y+KT+ VA+I+A+ + GL DL + ++A + VKTARWANGDIL+HD4LY 
Sbjct: 40 YKKT^TWALIVALLLFGLIYDLLGVQI<raimQKSAQPIWKTARVVANGDILIHDILY 99 

10 Query: 80 ASARQPDGTYNFTPYFKEWSWIESADIAIGDYEGTISSEYPLAGYPLFNAPNEIATTMK 139 

SAR+ D TY+FTPYF+ VK WI ADLAIGDYEGTIS +YPLAGYPLFNAP EIA +K 
Sbjct: 100 MSARKADDTYDFTPYFEYVKDWISGADLAIGDYEGTISPDYPLAGYPLFNAPEEIAGALK 159 

Query: 140 ETGYDWDIAHNHILDSQIAGAINTVTCTF^ 199 
15 TGYDWDLAHNHILDSQL GA+NT K F++LG+D+IG+Y KDR+KE LIK+VNGIKIA 

Sbjct: 160 ICTGYDVVDIjAHKHILDSQLDGALOTKKVFHQLGIDSIGIYDKDRSKESFLIKNWG 219 



Query: 200 ILGYSYGYNGMEANVSKSDYEKHMS3LDTKKIKQDIKKAEKEADITIVMPQMGIEYQKKP 259 

ILGYSYGYNGMEA +S+ DYEKHMSDLD KIK++++ AEK+AD+TIVMPQMG EY +P 
Sbjct: 220 ILGYSYGYNGMEATLSQEDYEKHMSDLDEAKIKKELQIAEKKADVTIVMPQMGTEYAIiEP 279 

Query: 260 TTEQVMLYHSMIKWGADIIFGGHPHVVEPSEVIKKDGQKKFIIYSMGNFISNQRLETVDD 319 

T EQ LYH MI WGAD+ + GGHPHV+EPSE + K QKKFIIYSMGNFI SNQRLETVDD 
Sbjct: 280 TAEQECELYHKMIDWGADWLGGHPHVIEPSETVIKGRQKKFIIYSMGNFISNQRLETVDD 339 

Query: 320 IWTERGLLMDVTIEKKGQKTVIKKVKAHPTLVEA 353 

IWTERGLLMD+T EKK KT IK V+AHPT+V A 
Sbjct: 340 IWTERGLLMDLTFEKKDNKTKIKTVEAHPTMVTJi 373 



30 A related GBS gene <SEQ ID 8615> and protein <SEQ ID 8616> were also identified. Analysis of this 

protein sequence reveals the following: 

Lipop Possible site: -1 Crend: 7 
SRCFLG : 0 

McG: Length of W: 18 
35 Peak Value of UR: 3.83 

Net Charge of CR: 2 
McG: Discrim Score: 15.3 6 

GvH: Signal Score (-7.5): -1.52 
Possible site: 32 
40 »> Seems to have a cleavable N-term signal seq. 

Amino Acid Composition: calculated from 33 
ALOM program count: 0 value: 4.35 threshold: 0.0 
PERIPHERAL Likelihood = 4.35 170 
modified ALOM score: -1.37 

45 

*** Reasoning Step: 3 



- Final Results 

bacterial outside - 
bacterial membrane - 
bacterial cytoplasm - 



- Certainty=0.3000(Affirmative) ■ 
• Certainty=0. 0000 (Not Clear) < i 
■ Certainty=0. 0000 (Not Clear) < i 



55 The protein has homology with the following sequences in the databases: 

30.6/53.3% over 230aa 

Bacillus anthracis 

EGAD|2015l| capa protein Insert characterized 
SP|P19579|CAPA_BACAN CAPA PROTEIN. Edit characterized 
60 GP| 142633 ]gb|AAA22288.l| |N24150 46 Kd encapsulation protein CapA Insert characterized 

PIR|C3009l|C30091 capA protein - Insert characterized 



ORF02075(574 - 1257 of 1734) 
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EGAD | 20151 | 20674 (83 - 313 of 411) capa protein {Bacillus anthracis} SP | P19579 | CAPA_BACAN 
CAPA PROTEIN. GP 1 142633 | gb |AAA22288 . 1 1 |M24150 46 Kd encapsulation protein CapA {Bacillus 
anthracis} PIR| C30091 | C30091 capA protein - Bacillus anthracis 
%Match =8.9 
5 %Identity =30.5 SrSimilarity =53.3 

Matches = 70 Mismatches = 102 Conservative Sub.s = 52 

468 498 528 558 585 615 645 663 

LAQSKGQKVANNNTVKTARWANGDILLHDVLYASARQPDGTYNFTPY-FKEVKSWIESADLAIGDYEGTI SSEYP 

10 :| : |: || :: |=: : | | |: | ss:::| |::| : | 

IAATWVQRTEATOPVKHRENEKLTMTMVGDII^^ 

50 60 70 80 90 100 110 

690 720 750 780 810 840 870 900 

1 5 IAGYPL- FNAPNEIATTMKETGYDVVDIJfflNHILDSQ 

I : ::| I HI 1= |::||:||: 1 I =1=1 I II =1 == ==1= ::|||:::| 

KADKNIHLSAKEETOKAVKEAGFTVUJLANNHM^ 

130 140 150 160 170 180 190 

20 927 957 987 1017 1047 1077 1107 1137 

LGYSYGY-NGMEAWSKSDYEKHMSDLDTKKIKQDIKKAEKEADITIVMPQMGIEYQKKPTTEQVMLYHSMIKWGADIIF 
II" •- I I : h hi : : II: :| : I II II: I I :h Mill 

LGFTDAFVAGAIATKEQPGSLSMNPDVLLKQISKAKDPKXGNADL^^ 

210 220 230 240 250 260 270 

25 

1167 1197 1227 1257 1287 1317 1347 1377 

GGHPHVVEPSEVIKKDGQKKFIIYSMGNFISNQRLETVDDIWTERGLLM 

I ||||:: • | | , | ||,|||: :| I ■ =11 = = I I 

GHHPHVLQSFDVYK QGIIFYSLGNFVFDQGWTRTKDSALVQYHLRDNGTAILDvVPLNIQEGSPKPVASALDKNRV 

30 290 300 310 320 330 340 350 

SEQ ID 8616 (GBS289) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 57 (lane 5; MW 40kDa), in Figure 181 (lane 6; MW 47kE>a), in Figure 169 (lane 
13 & 14; MW 54.5kDa - thioredoxin fusion) and in Figure 239 (lane 3; MW 54.5kDa). It was also 
35 expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 61 
(lane 5; MW 65kDa). 

SEQ ID 8616 (GBS289L) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total 
cell extract is shown in Figure 126 (lane 2; MW 72kDa) and in Figure 184 (lane 5; MW 72kDa). It was also 
expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 126 
40 (lane 5-7; MW47kDa). 

GBS289L-His was purified as shown in Figure 234, lane 9-10. Purified GBS289L-GST is shown in Figure 
245, lane 10. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

45 Example 602 

A DNA sequence (GBSx0642) was identified in S.agalactiae <SEQ ID 1863> which encodes the amino 
acid sequence <SEQ ID 1864>. This protein is predicted to be thiamin biosynthesis protein Thil (thil). 
Analysis of this protein sequence reveals the following: 

Possible site: 55 
50 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2720 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 
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bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ED 9971> which encodes amino acid sequence <SEQ ID 9972> 
was also identified. 

5 The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC00308 GB:AF008220 YtbJ [Bacillus subtilis] 
Identities = 184/354 (51%) , Positives = 249/354 (69%) 



LK++FGIQ+FS + K++D+ ++ YKG TFK+ KR+ 



Query: 


11 


Sbjct: 


1 




71 


Sbjct: 


61 


Query: 


131 


Sbjct: 


121 






Sbjct: 


181 


Query: 


251 


Sbjct: 


241 




311 


Sbjct: 


301 



++ PDI L++EIR+EA +++ D +GAGGLPVG++GK 



P TRR M++I DRIRE RNGL II GESLGQVASQTLES 



M AINAVT+TPI +RP++ MDK EII+ +++I T++ SIQPFEDCCTIF +P+ 
mAINAVTSTPILRPLISMDKTEIIEKSREIGTYETSIQPFEDCCTIFTTAKPR 3 54 

A related DNA sequence was identified in S. pyogenes <SEQ ID 1865> which encodes the amino acid 
sequence <SEQ ID 1866>. Analysis of this protein sequence reveals the following: 

Possible site: 42 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .4897 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

= 316/404 (78%) , Positives = 362/404 (89%) 



Ident: 






11 


Sbjct: 


1 




71 


Sbjct: 


61 




131 


Sbjct: 


121 


Query: 


191 


Sbjct: 


181 




251 



- E+LK +FG+QA SP +K+EK+V LV AVQ-^IKTS+Y+DG+TFKI KRSDH+FELD 



SR LN LG AVF VLPNI+AQMK PD+ LKVEIRDEAAYI SYE+ 1 +GAGGLPVGTSGKG 



MLMLSGGIDSPVAGYLALKRG+DIE VHFASPPYTSPGAL KA DLTR+LT+FGGNIQFI 



Query: 251 EVPPTEIQEEIKaKAPEAYLMTLTRRFMMRITDRIREDRTJGLVIINGESLGQVASQTLES : 
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EVPFTEIQEEIK KAPEAYLMTLTRRF.MMRITD IRE R GLVI4NGESLGQVASQTLES 
Sbjct: 241 EVPFTEIQEEIKNK&PEAYLMTLT3RF.MMRITDAIREQRKGLVIWGESLGQVASQTLES 300 

Query: 311 MQAINAVTATP I IRPWTMDKLE I ID I AQKIDTFDIS IQPFEDCCTI FAPDRPKTNPKI K 370 
5 MQAINAVT+TPIIRPWTMDKLKII++AQ IDTFDISIQPFEDCCTIFAPDRPKTNPK.4 

Sbjct: 301 MQAINAVTSTP I IRPWTMDKLE I IEMAQAIDTFDIS I QPFEDCCTI FAPDRPKTNPKLG 360 

Query: 371 NTEQYEKRMDVEGLVERAVAGIMVTTIQPQADSDDVDDLIDDLL 414 
N E+YE+ D44GLV4RAV4GI4VT I P+ +D+V++LID LL 
10 Sbjct: 361 NAEKYEECFDIDGLVQRAVSGIWTEITPEIVNDEVENLIDALL 404 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 603 

15 A DNA sequence (GBSx0643) was identified in S.agalactiae <SEQ ID 1867> which encodes the amino 
acid sequence <SEQ ID 1868>. This protein is predicted to be nifs protein homolog , fragment. Analysis of 
this protein sequence reveals the following: 

Possible site: 47 

»> Seems to have no N-terminal signal sequence 
20 INTEGRAL Likelihood = -0.27 Transmembrane 131 - 147 ( 131 - 147) 

Final Results 

bacterial membrane --- Certainty=0 . 1107 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Wot Clear) < suco 

25 bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA43493 GB:X61190 nifS-like gene [Lactobacillus delbrueckii] 
Identities = 177/353 (50%) , Positives = 234/353 (66%) , Gaps = 1/353 (0%) 



Query: 


14 PEVLRTYQEVASKIYGNPSSLHELGTTSSRILI 


3ASRKQIASLLELKANEIFFTSGGTEAD 73 




P+ L TY +V 4KI+GNPSSLH+LG + +LI 


2ASRKQ+A LL + +EI+FTSGGTE++ 


Sbjct: 


3 PKALETYSQWTKIWGNPSSLHKLGDRAHGLLI 


3ASRKQVADLLGVNTDE I YFTSGGTESN 62 


Query: 


74 NWVIKGLAFEKQHFGNHIIVSDIEHPAVKESAI 


CWLGEYGFEIDYAPVDDKGFVDVEALVK 133 




N I KG A+ K+ FG HII S +EH +V + 


L GF + PVD +G V+ E L 


Sb j ct : 


63 NTAI KGTAWAKREFGKHI ITSSVEHASVANTF'. 


rELSNLGFRVTRLPVDKEGRVNPEDLKA 122 



Query: 134 LIKPETILISIMAINNEIGSIQPIKAISDLLSDKPTISFHVDAVQAIGKIPTKDYLTERV 193 
40 + +T L+SIM +NNEIG+IQPIK IS44L4D P I FHVD VQA+GK T RV 

Sbjct: 123 ALDKDTTLVSIMGVNNEIGTIQPIKEISEILADYPNIHFHVDNVQALGKGIWDQVFTSRV 182 

Query: 194 DFASFSSHKFHGVRGVGFLYIKEGKRISPLLTGGGQETDLRSTTENVAGIAATAKALRMV 253 
D SFSSHKFHG RG4G LY K G4 4 PL GGGQE LRS TEN4A IAA AKA R44 
45 Sbjct: 183 DMMSFSSHKFHGPRGIGILYKKRGRMLMPLCEGGGCEKGLRSGTENLAAIAAMAICAARLL 242 

Query: 254 MDKEWAIPKISKMKTIIHDELAKYEDITLFSG-KEDFSPNIITFGIKGVRGEVLVHAFE 312 

4 E 4 4K I LA I 4FS K DF4P4I4 F 44G4RGE LVH E 

Sbjct: 243 LTDEKEKADREYAIKEKISKYLAC-KPGIHIFSPLKADFAPHILCFALEGIRGETLVHTLE 302 

50 

Query: 313 GHDIFISTTSACSSKAGKPAGTLIAI1GISTKLAQTAVRISLDDDNDMGQVEQF 365 

DI4ISTTSAC+SK A TL4AM 4A 4AVR4S D4 N 4 4 44F 

Sbjct: 3 03 DQDIYISTTSACASKKADFASTLVAMKTPDAIATSAVRLSFDESNTLEEADEF 355 

55 A related DNA sequence was identified in S.pyogenes <SEQ ID 1869> which encodes the amino acid 
sequence <SEQ ID 1870>. Analysis of this protein sequence reveals the following: 

d N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0. 3 06 7 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 268/370 (72%) , Positives - 322/370 (86%) 

Query: 1 MIYFDNSATTIPYPEVLRTYQEVASKIYGNPSSLKELGTTSSRILEASRKQIASLLELKA 60 

MIYFDN+ATTIPY E L+TYQEVA+KIYGNPSSLH+LGT +SRILEASRKQIA LL +K+ 
Sbjct: 1 MIYFDNAATTIPYGEALKTYQEVATKIYGNPSSLHQLGTNASRILEASRKQIAGLLGVKS 60 

Query: 61 NEIFFTSGGTEADNWVIKGLAFEKQHFGNHIIVSDIEHPAWESAKWLGEYGFEIDYAPV 120 

EIFFTSGGTE+ NW IKG+AFEK FG HII+S IEHPAV ES KWL GFE+ YAPV 
Sbjct: 61 EEIFFTSGGTESANWAIKGIAFEKNAFGKHT I ISAIEHPAVSESVKWLLTQGFEVSYAPV 120 

Query: 121 DDKGFVDVFALVKLIKPETILISIMAINNEIGSIQPIIO.ISDLLSDKPTISFHVDAVQAI 180 

+G VDV AL +LI+P+TILISIMA+NNE+G+IQPI+AIS+LL+++PTI+FHVDAVQAI 
Sbjct: 121 TTQGVVDVNALAELIRPDTILISIMAVNNEMGAIQPIRAISNLLANQPTITFHVDAVQAI 180 

Query: 181 GKIPTKDYLTERVDFASFSSHKFHGVRGVGFLYIKEGKRISPLLTGGGQETDLRSTTENV 240 

GKIP DY+T RVD ASFS HKFH VRGVGFLY K GKR++PLL+GGGQE +LRSTTENV 
Sbjct: 181 GKIPLCDYMTNRVDLASFSGHKFHSVRGVGFLYKKAGKRLNPLLSGGGQEQELRSTTENV 240 

Query: 241 AGIAATAKALRMVMDKEWAIPKISKMKTIIHDELAKYED1TLFSGKEDFSPNIITFGIK 300 

AGIA+ AKALR+V +K4V +PK++ M+ +1+ L+ Y D+T+FS +E F+PNI+TFGI+ 
Sbjct: 241 AGIASMAKALRIVTEKQVSVLPKLTAMRDVIYKSLSAYPDVTVFSAQEGFAPNILTFGIR 300 

Query: 301 GVRGEVLVHAFEGHDIFISTTSACSSKAGKPAGTLIAMGISTKLAQTAVRISLDDDNDMG 360 

GVRGEV+VHAFE ++I+ISTTSACSSKAG+PAG+L+AMGI K AQTAVRISLDDDNDMG 
Sbjct: 301 GVRGEVIVHAFEKYEIYISTTSAC3SKAGEPAGSLVAMGIPVKTAQTAVRISLDDDNDMG 360 

Query: 361 QVEQFLTIFK 370 

QVEQFLTIF+ 
Sbjct: 361 QVEQFLTIFQ 370 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 604 

A DNA sequence (GBSx0644) was identified in S.agalactiae <SEQ ID 1871> which encodes the amino 
acid sequence <SEQ ID 1872>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1539<Aff irmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 605 . 

A DNA sequence (GBSx0645) was identified in S.agalactiae <SEQ ID CR3f/KHch enfcode*the ammo " 
acid sequence <SEQ ID 1874>. This protein is predicted to be glutathione reductase (gor). Analysis of this 
protein sequence reveals the following: 

5 Possible site: 23 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -4.25 Transmembrane 170 - 186 ( 169 - 187) 



Final Results 

bacterial membrane Certainty=0 . 2699 (Affirmative) < succ 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 



MSKQYDYIVIGGGSAGSGTANRAAMYGAKVLLIEGGQVGGTCVNLGCVPKKIMWYGAQVS 60 
M+KQYDYIVIGGGS G 4ANRAAM+GAKV+L EG QVGGTCVN+GCVPKK+MWYGAQV+ 
MTKQYDYI VIGGGSGGIASANRAAraGAKVILFEGKQVGGTCVNVGCVPKKVMWYGAQVA 6 0 



121 HTIEVNGQQYKAPHITIATGGHPLYPDIIGSELGETSDDFFGVffiTLPDSILIVGAGYIAA 180 

HT+EV G+ Y APHI IATGGH L PDI GSE G TSD FF + +P +VGAGYIA 
121 HTVEVAGEHYTAPHILIATGGHALLPDIPGSEYGITSDGFFELnAIPKRTAWGAGYIAV 180 

181 EIAGVVNELGVETHLAFRKDHILRGFDDMWSEVMAEIffiKSGISLHANHVPKSLKRDEGG 240 

E++GV++ LG ETHL R+D LR FD + ++ EM+K G LH VPK + + + 
181 E I SGVLHALGGETHLFVRRDRPLRKFDKEI VGTLVDEMKKDGPHLHTFS VPKEVI KNTDN 240 



300 IGDVNGKIALTPVAIAAGRRLSERLFNHKDNEKLDYHNVPSVIFTHPVIGTVGLSEAAAI 3 
+GDVNGK+ LTPVA+ AGR+LSERLFNHK K+DY +V +VIF+HPVIG++GLSE A+ 

301 LGDTOGKIjELTPVAVKAGRQLSERLFWHKPQAKMDYKDVATVIFSHPVIGSIGLSEEVAL 3 



360 EQFGEDNIKVYTSTFTSMYTAVTTNRQAVKMKLITLGKESKVIGLl 
+Q+GE+N+ VY STFTSMYTAVT++RQA KMKL+T+G++EK++GLHG+GYG+DEMIQGF+ 

361 DQYGEENVTVYRSTFTSMYTAVTSHRQACKMKLVTVGEDEKIVGLHGIGYGVDEMIQGFA 420 



A related DNA sequence was identified in S.pyogenes <SEQ ID 1875> which encodes the amino acid 
sequence <SEQ ID 1876>. Analysis of this protein sequence reveals the following: 

Possible site: 23 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.33 Transmembrane 173 - 189 ( 173 - 191) 

Final Results 

bacterial membrane --- Certainty=0. 1532 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 
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Identities = 268/446 (60%), Positives = 340/446 (76%), Gaps = 1/446 (0%) 

Query: 5 YDYIVIGGGSAGSGTANRAAMYGAKVLLIEGGQVGGTCVNLGCVPKKIMWYGAQVSETLH 64 

YDYIVIGGGSAG +ANRAAM+GAKVLL EG ++GGTCVMLGCVPKK+MWYGAQV++ L 
Sbjct: 8 YDYIVIGGGSAGIASANRAAMHGAKVLIAEGKEIGGTC\OTLGCVPKKVMWYGAQVADILG 67 

Query: 65 1CYSSGYGFEVNNLNFDFTTLKANRDAYVQRSRQSYAANFERNGVEKIDGFARFIDNHTIE 124 

Y+ YGF+ FDF LKANR AY+ R SY FE+NGV++I +A F D HT+E 

Sbjct: 68 TYAKDYGFDFKEKAFDFKQLKANRQAYIDRIHRSYERGFEQNGVDRIYDYAVFKDAHTVE 127 

Query: 125 VNGQQYKAPHITIATGGHPLYPDIIGSELGETSDDFFGKETLPDS1LIVGAGYIAAEIAG 184 

+ GQ Y APHI IATGGHP++PDI G++ G +SD FF + +P +VGAGYIA ELAG 
Sbjct: 128 IAGQLYTAPHILIATGGHPVFPDIEGAQYGISSDGFFALDEVPKRTAWGAGYIAVEliAG 187 

Query: 185 VVNELGVETHIAFRKDHILRGFDDMVTSEVMAEMEKSGISLHANHVPKSLKRDEGGKLIF 244 

V++ LG +T L R D LR FD + ++ EM +G LH + + ++ L 

Sbjct: 188 VLHALGSKTDLFIRHDRPLRSFDKTIVDVLVDEMAVNGPRLHTHAEVAKWKNTDESLTL 247 

Query: 245 EAENGKTLVVDRVIWAIGRGPNVD-MGLENTDIVLNDKGYIKADEFENTSVDGVYAIGDV 303 

++G+ + VD++IWAIGR PN++ L+ T + LNDKGYI+ D +ENTSV G+YA+GDV 
Sbjct: 248 YLKDGQEVEVDQLIWAIGRKPNLEGFSLDKTGVTliNDKGYIETDAYENTSVKGIYAVGDV 307 

Query: 3 04 NGKIALTPVAIAAGRRLSERLFNHKDNEKLDYHNVPSVIFTHPVIGTVGLSEAAAIEQFG 363 

NGK+ALTPVA+AAGRRLSERLFN K +EKLDY NV +VI F+HPVIG+VGLSE AA++Q+G 
Sbjct: 3 08 NGKIALTPVAVAAGRRLSERLFNGKTDEKLDYQNVATVIFSHPVIGSVGLSEF^VKQYG 367 

Query: 364 EDNIKVYTSTFTSMYTAVTTNRQAVKMKLITLGKEEKVIGLHGVGYGIDEMIQGFSVAIK 423 

++ +K Y S FTSM+TA+T +RQ MKL+T+G EK++GLHG+GYG+DEMIQGF+VAIK 
Sbjct: 368 QEAVKTYQSRFTSMFTAITNHRQPCLMKLVTVGDTEKIVGLHGIGYGVDEMIQGFAVAIK 427 

Query: 424 MGATKADFDDTVAIHPTGSEEFVTMR 449 

MGATKADFD+TVAIHPTGSEEFVTMR 
Sbjct: 428 MGATKADFDNTVAIHPTGSEEFVTMR 453 

SEQ ID 1874 (GBS417) was expressed in E.coli as a His-fosion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 79 (lane 5; MW 53kDa). 

GBS417-His was purified as shown in Figure 216, lane 2. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 606 

A DNA sequence (GBSx0646) was identified in S.agalactiae <SEQ ID 1877> which encodes the amino 
acid sequence <SEQ ID 1878>. Analysis of this protein sequence reveals the following: 

Possible site: 35 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3122 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC62417 GB:AF084104 hypothetical protein [Bacillus firmus] 
Identities = 33/110 (30%) , Positives = 66/110 (60%) 

Query: 1 MANVYDLANELERAVRALPEYQAVLTAKSA 60 

M+NVYD A+EL++A+ E+ A+ + IE+D A+ + ++F Q ++Q+ G 
Sbjct: 1 MSNVYDKAHELKKAIAESEEFSALKBMHEEIEADEIAKKMLENFRNLQLELQQKQMQGIQ 60 

Query: 61 PSQEEQDEMSKLGEK1ESNDLLKVYFDQQQRLSVYMSDIEKIVFAPMQDL 110 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 1879> which encodes the amino acid 
sequence <SEQ ID 1 880>. Analysis of this protein sequence reveals the following: 

Possible site: 38 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 4058 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 68/108 (62%) , Positives = 86/108 (78%) 

Query: 4 VYDLflNELERAVRALPKYQAVLTAKSAIESDADAQVLWQDFLATQSKVQEMMQSGQMPSQ 63 

+YD AN+LERAVRALPEYQ VL K AI++D A L+ +F+A Q K+Q MMQSGQMP+ 
Sbjct: 5 IYDYANQLERAVRALPEYQKVLEVKEAIQADVSASELFDEFVAMQEKIQGMMQSGQMPTA 64 

Query: 64 EEQDEMSKLGEKIESNDLLKVYFDQQQRLSVYMSDIEKIVFAPMQDLM 111 

EEQ + +L +KIE+ND LK YF+ QQ LSVYMSDIE+IVFAP++DL+ 
Sbjct: 65 EEQTSIQELSQKIEANDQLKAYFEAQQALSVYMSDIERIVFAPLKDLV 112 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 607 

A DNA sequence (GBSx0647) was identified in S.agalactiae <SEQ ID 1881> which encodes the amino 
acid sequence <SEQ ID 1882>. This protein is predicted to be chorismate synthase (aroC). Analysis of this 
protein sequence reveals the following: 
Possible site: 15 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -4.67 Transmembrane 343 - 359 ( 341 - 364) 



35 Final Results 

bacterial membrane Certainty=0 . 2869 (Affirmative) < succ 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

40 The protein has homology with the following sequences in the GENPEPT database: 



Query: 1 MRYLTAGESHGPSLTAIIEGIPAGLKLSAKDINEDljKRRQGGYGRGNRMKIETDQVIISS 60 

MRYLTAGESHGP LT IIEG PA L+L A DIN DL RRQGG+GRG RM+IE DQV I 
Sbjct: 1 MRYLTAGESHGPQLTTIIEGAPAQLELVADDINVDLftRRQGGHGRGRRMQIEKDQVQIVG 60 

Query: 61 GVRHGKTLGSPITLIVTNKDHSKWLDIMSVEDI--EERI1KQKRRIKHPRPGHADLVGGIK 118 

G+RHGKT G+PI L V NKD W IM E + +E + KR+I PRPGHADL G IK 
Sbjct: 61 GIRHGKTTGAPIALVVENKDWKHl<ITKIMGAEPLTGDEEKEIKRKITRPRPGHADLNGAIK 120 

Query: 119 YRFDDLRNALERSSARETTMRVAIGAIAKRILKEIGIEIANHIWFGGKEIWPDKLTVQ 178 

Y D+RN LERSSARETT+RVA GA+AK+IL+ GIE+ +H++ GG + + 
Sbjct: 121 YGHRDMRNVLERSSARETTVRVAAGAVAKKILRTFGIEVGSHvLEIGGVKAEKTSYDQLS 180 

Query: 179 QIKVLSSQSQVAIWSFEQEIKDYIDSVKKAGDTIGGVVETIVGGVPVGLGSYVHWDRK 238 

+K L+ S V ++ EQE+ ID K+ GD+IGGWE IV GVP+GLGS+VH+DRK 
Sbjct: 181 NLKEIiAEASPVRCLDI<IMQEMIAaiDQAKENGDSIGGVVEVI\7EGVPIGLGSHVHYDRK 240 
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Query: 239 LDAKIAQA.WSINAFKGVEFGLGFKSGFLKGSQVMDSISWTKDQC3YIRQSNNLGGFEGGM 298 

LDAKIA AV+SINAFKGVEFG+GF++ GS+V D I+W +++GY R+SNNLGGFEGGM 
Sbjct: 241 LDAKIAAAVNSINAFKGVEFGIGF3AA8KPGSEVKDEIAWDEERGYYRKSNNLGGFEGGM 300 

Query: 299 TNGEPIITOGVMKPIPTLYKPLMSVDIDTHEPYRATVERSDPTALPAAGVVMEAWATVL 358 

TNG PI4VRGVMKPIPTLYKPL SVDI T EP+ A++ERSD A+PAA W EAWA + 
Sbjct: 301 TKGMPIWRGVMKPIPTLYKPLQSVDIATKEPFAASIERSDSCAVPAAAWAEAWAWEV 360 

Query: 359 VTEVLEKFSSDNMYELKEAVK 379 

+LE+F +D + E+++ ++ 
Sbjct: 361 ANALLERFGADQVEEIEKNIR 381 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1883> which encodes the amino acid 

sequence <SEQ ID 1 884>. Analysis of this protein sequence reveals the following: 

Possible site: 15 
>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.75 Transmembrane 342 - 358 ( 342 - 359) 
INTEGRAL Likelihood = -0.16 Transmembrane 155 - 171 ( 155 - 171) 



Final Results 

bacterial membrane Certainty=0 . 1298 (Affirmative) ■ 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < : 

bacterial cytoplasm --- Certainty=0 .0000 (Not Clear) •< i 

The protein has homology with the following sequences in the databases: 



Query: 1 LRYLTAGESHGPSLTAIIEGIPAGLTLHPADIDHELQRRQGGYGRGARMSIETDRVQISS GO 

+RYLTAGESHGP LT IIEG PA L L DI+ +L RRQGG+GRG RM IE D+VQI 
Sbjct: 1 MRYLTAGESHGPQLTTIIEGAPAQLELVADD1NVDLARRQGGHGRGRRMQIEKDQVQIVG 60 



Query: 119 YHFNDLRDALERSSARETTMRVAVGAVAKRILAELGIDMLHHILIFGGITITIPSKLSFR 178 

Y D+R+ LERSSARETT+RVA GAVAK+IL GI++ H+L GG+ S 
Sbjct: 121 YGHRDMRNVLERSSARETTVRVAAGAVAKKILRTFGIEVGSHVLEIGGVKAEKTSYDQLS 180 

Query: 179 ELQERALHSELSIVNPKQEEEIKTYIDKIKKEGDTIGGIIETIVQGVPAGLGSYVQWDKK 238 

L+E A S + ++ + E+E+ ID+ K+ GD+IGG++E IV+GVP GLGS+V +D+K 
Sbjct: 181 NLKELAEASPVRCLDKEAEQEMIAAIDQASENGDSIC-GWEVIVEGVPIGLGSHVHYDRK 240 



Query: 299 TTGQPLvmGVMKPIPTLYKPLMSVDIDSHEPYKATVERSDPTALPAAGVIMENWATVL 358 

T G P+W+GVMKPIPTLYKPL SVDI + EP+ AH-+ERSD A+PAA V+ E WA + 
Sbjct: 301 TNGMPIVVRGVMKPIPTLYKPLQSVDIATKEPFAASIERSDSCAVPAAAVVAEAWAWEV 360 

Query: 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 284/388 (73%), Positives = 333/388 (85%) 

Query: 1 MRYLTAGESHGPSLTAIIEGIPAGLKLSAKDINEDLKRRQGGYGRGNRMKIETDQVIISS 60 

+RYLTAGESHGPSLTAIIEGIPAGL L DI+ +L+RRQGGYGRG RM IETD+V ISS 
Sbjct: 1 LRYLTAGESHGPSLTAI IEGI PAGLTLHPfiD I DHELQRRQGGYGRGARMS IETDRVQI SS 60 
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Query: 61 GVRHGKTLGSPITLTVTNKDHSKWLDIMSVEDIEERLKQKRRIKHPRPGHADLVC3GIKYR 120 

GVRHGKT G+PITLTV NKDH KWLD+M+V DIEE LK KRR+KHPRPGHADLVGGIKY 
Sbjct: 61 GWHGKTTGAPITLTVINKDHQKWLDVMAVGDIEETLKLKRRVKHPRPGHftDLVGGIKyH 120 

Query: 121 FDDLRNALERSSARETTMRVAIGAIAKRILKEIGIEIANHIWFGGKEITVPDKLTVQQI 180 

F+DLR+ALERSSARETTMRVA+GA+AKRIL E+GI++ +HI++FGG IT+P KL+ +++ 
Sbjct: 121 FNDLRDALERSSARETTMRVAVGAVAKRILAELGIDMLHHILIFGGITITIPSKLSFREL 180 

Query. 181 KVLSSQSQVAIVNPSFEQEIKDYIDSVKKAGDT1GGWETIVGGVPVGLGSYVHWDRKLD 240 

+ + S+++IVNP E+EIK YID +KK GDTIGG++ETIV GVP GLGSYV WD+KLD 
Sbjct: 181 QERALHSELS I VNPKQEEEI KTYI DKI KKEGDT I GG 1 1 ETI VQGVPAGLGSYVQWDKKLD 240 

Query: 241 AKIAQAWSINAFKGVEFGLGFKSGFLKGSQVMDSISOTKDQGYIRQSNHLGGFEGGMTN 300 

AK+AQAV4SINAFKGVEFG GF GF KGSQVMD I+WT QGY RQ+N+LGGFEGGMT 
Sbjct: 241 AI<IlAQAVLSINAFKGVEFGAGFDMGFQKGSQV^©EITWTPTQGYGRQ , ENHLGGFEGGMTT 300 

Query: 301 GEPIIVRGVMKPIPTLYKPLMSVDIDTHEPYRATVERSDPTALPAAGWMEAWATVLVT 360 

G+P++V+GVMKPIPTLYKPLMSVDID4HEPY+ATVERSDPTALPAAGV+ME WATVL 
Sbjct: 301 GQPLVWGVMKPIPTLYKPLMSVDIDSHEPYKATVERSDPTALPAAGVIMENWATVIAK 360 



Sbjct: 361 EILETFSSTTMSELQKAFSDYRAYVKQF 3 8( 



25 A related GBS gene <SEQ ID 8617> and protein <SEQ ID 8618> were also identified. Analysis of this 

protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 9 
McG: Discrim Score: -2.42 
GvH: Signal Score (-7.5): -3.23 
30 Possible site: 15 

>>> Seems to have no N-terrainal signal sequence 

ALOM program count: 1 value: -4.67 threshold: 0.0 

INTEGRAL Likelihood = -4.67 Transmembrane 343 - 359 ( 341 - 364) 
PERIPHERAL Likelihood = 0.69 214 
35 modified ALOM score: 1.43 



*** Reasoning Step: 3 

Final Results 

40 bacterial membrane --- Certainty=0 .2869 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

45 57.7/73.8% over 354aa 

Bacillus subtilis 

EGAD | 20299 | chorismate synthase Insert characterized 

SP|P31104lAROC_BACSU CHORISMATE SYNTHASE (EC 4.6.1.4) ( 5 -ENOLPYRUVYLSHIKIMATE- 3 -PHOSPHATE 
PHOSPHOLYASE) 

50 (VEGETATIVE PROTEIN 216) (VEG216) . Edit characterized 

GP| 143806 |gb|AAA20859.l| |M80245 AroF Insert characterized 

GP|2634689|emb|CAB14187.l| | 299115 chorismate synthase Insert characterized 
PIR|C69590|C69590 chorismate synthase aroF - Insert characterized 

55 ORF0012K301 - 1359 of 1719) 

EGAD|20299|BS2267(1 - 355 of 368) chorismate synthase {Bacillus 

SUbtilis}SP|P31104|AROC_BACSU CHORISMATE SYNTHASE (EC 4.6.1.4) ( 5 - ENOLPYRUVYLSHI KIMATE - 3 - 
PHOSPHATE PHOSPHOLYASE) (VEGETATIVE PROTEIN 216) (VEG216) .GP| 143806 |gb|AAA20859.l| |M80245 
AroF {Bacillus subtilis}GP | 2634689| emb| CAB14187.1 | | Z99115 chorismate synthase {Bacillus 

60 subtilis}PIR|C69590|C69590 chorismate synthase aroF - Bacillus subtilis 

%Match =35.0 

%Identity =57.6 %Similarity =73.7 

Matches = 204 Mismatches = 92 Conservative Sub.s = 57 



65 



75 105 135 165 195 225 255 285 
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IQLSRVAERKNLMPRGISQDIYMCLKFGLFVHYMIITOiaDVlFDILSHDKKASGQFIKIVILPQLGSATVHQIPLEEMRD 

315 345 375 405 435 465 495 525 

YLEK*MRYLTAGESHGPSLTAIIEGIPAGLKLSAKDII^DLKPJiQGGyGRGlSIRMKIETDQVIISSGVRHGKTLGSPITLT 

5 MINIM II Nihil =: Mil =| III hill 1 1 1 1 II I Mill ■■WWW I 

MRYLTAGESHGPQLTTIIEGVPAGLYITEEDINFELARRQKGKGRGRRMQIEKDQAKIMSGVRHARTLGSPIALV 
10 20 30 40 50 60 70 



VTWKDHSKWLDIMSVEDI--EERLKQKRRIKHPRPGHADLVGGIKYRFDDLRNALERSSARETTMRVAIGAIAKRILKEI 

i i i i ii i :i : 11 = 1 mum i in mi mmiimm iimimi i = 



GIEIftNHIWFGGKEITVPDKLTVQQI KVLSSQSQVAIVNPS FEQE I KDYIDSVKKAGDTIGGWETIVGGVPVGLGSYV 
IhM h: I = = = = := -Ml : = = = II I 11=111=11 II 1 = 111 = 1111 

GIK^/AGHVLQIGAVKAEKTGYTSIEDLQRVTEESPVRCYD3EAGKKI4^AAIDEAKANGDSIGGIVEVIVEGMPVGVGSYV 
170 180 190 200 210 220 230 

20 

1029 1059 1089 1119 1149 1179 1209 1239 

HWDRECLDAKIAQAWSINAFKGVEFGLGFKSGFLKGSQVmSISWTKDQGYIRQSNNLGGFEGGMTNGEPIIVRGVMKPI 

IMIIIhlM lhllllllllllhlh= 11 = 1 I I I = = = || I =1 Nihil I IIMIIIIIII 

HYDRIOliDSICLAAAVLSINAFKGVEFGIGFFAAGRNGSEVHDEIIMJEEKGYTRATNRLGGLEGGMTTGMPIVWGV^ 
25 250 260 270 280 290 300 310 

1269 1299 1329 1359 1389 1419 1449 1479 

PTLYKPLMSVDIDTHEPYRATVERSDPTALPAAGVVMEAVVATVLVTEVLEKFSSDNMYN* KKL*NYIAIMLI IFK* KLV 
Ml lllh 11= | = = |||| | = ||| || ||= I = 

30 PTLYKPLKSVDIETICEPFSASIERSDSCaVPAASWAEALSLGKLQPStNNSD 
330 340 350 360 

SEQ ID 8618 (GBS192) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 18 (lane 4; MW 44kDa). 

35 GBS 1 92-His was purified as shown in Figure 1 96, lane 4. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 608 

A DNA sequence (GBSx0648) was identified in S.agalacttae <SEQ ID 1885> which encodes the amino 
40 acid sequence <SEQ ID 1886>. This protein is predicted to be 3-dehydroquinate synthase (aroB). Analysis 
of this protein sequence reveals the following: 

Possible site: 24 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -3.82 Transmembrane 99 - 115 ( 98 - 116) 

45 

Final Results 

bacterial membrane — Certainty=0 . 2529 (Affirmative) < suco 
bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 
bacterial cytoplasm --- Certainty=C . 0000 (Not Clear) < suco 

50 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAA18068 GB:D90911 3-dehydroquinate synthase [Synechocystis sp.] 
Identities = 138/351 (39%), Positives = 200/351 (56%), Gaps = 4/351 (1%) 

55 Query: 3 VEVDLPNHPYHIKIEEGCFSEP.GDWVSHLWQKQMITIITDSNVEILYGESLVNQLKKQGF 62 

+ V LP PY ++I G + D ++ L +1 ++++ + YGE ++ L++ G+ 
Sbjct: 5 IPVPLPQSPYQVQIVPGGLAAIADHLAPLGLGKKIMWSNPEIYDYYGEWIQALQRAGY 64 
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Query: 




Sbjct: 


65 


Query: 


123 


Sbjct: 


125 


Query: 


183 


Sbjct: 


185 


Query: 


240 


Sbjct: 


245 




300 


Sbjct: 


305 



+F+Q+PTSL A VD+SIGGKTGVN KN++G F QP VI 



EVIKYG+I D +L+ LEE + +ID + D L II SCQ K 



LN+GHT+GH +E GYG I HGEAVAIGM +++A 



A related DNA sequence was identified in S.pyogenes <SEQ ID 1887> which encodes the amino acid 
sequence <SEQ ID 1888>. Analysis of this protein sequence reveals the following: 

Possible site: 60 
>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.43 Transmembrane 97 - 113 ( 97 - 114) 



Final Results 

bacterial membrane --- Certainty=0 . 1171 (Affirmative) • 

bacterial outside Certainty=0. 0000 (Not Clear) < i 

bacterial cytoplasm Certainty=0.0000 (Not Clear) < i 

The protein has homology with the following sequences in the databases: 



Query: 1 MPQTLHVHSRVHDYDILFTDHVLIOTLADCLGERKQ-RICLLFITDQTVyHLYQTLFEEFAQ 59 

M T+ V Y + L +AD L +K++ +++ +Y Y + + Q 

Sbjct: 1 MATTIPVPLPQSPYQVQIVPGGIAAIADHLAPLGLGKKIMWSNPEIYDYYGEWIQALQ 60 

40 Query: 60 Q--YNAFVHVCPPGGQSKSLERVSAIYDQLIAENFSIOCDMIVTIGGGWGDLGGFVAATY 117 

Sbjct 

Query: 118 YRGIPYIQIPTTLLSQVDSSIGGKVGVHFKGLTNMIGSIYPPEAIIISTTFLETLPQREF 177 
45 RGI ++Q+PT+LL+ VD+SIGGK GV+ N+IG+ Y P +1 L+TLP+REF 

Sbjct: 121 LRGINFVQVPTSLLRMVDASIGGKTGVNHPQG:<NLIGAFYQPRLVYIDPVVLKTLPEREF 180 

Query: 178 SCGISEMLKIGFIHDRPLFQQLRDFQ KETDKQGLERLIYQSI SNKKRTVEQDEFE 232 

G++E+4K G I D LF L + + + L ++I +S K +V QDE E 

50 Sbjct: 181 RAGMAEVIKYGVIWDSELFTALEEAEDLSSIDRLPDELLTKIIQRSCQAKVDWSQDEKE 240 

Query: 233 NGLRMSLNFGHTLGHAIESLCHHDFYHHGEAIAIGMVVDAKIjAVSKGLLPKEDLDSLLQV 292 

GLR LN+GHT+GH +ESL + +HGEA+AIGM AK+A GL + D Q+ 
Sbjct: 241 AGLRAILNYGHTVGHGVESLTGYGVINHGEAVAIGMEAAAKIAHYLGLCDQSLGDRQRQL 300 

55 

Query: 293 FERYQLPTTLERADVSATSLFDVFKTDKKNSEQKIIFILPTETGFTTLA 341 

+ +LPT + ++ +L DKK + FILPT G T++ 

Sbjct: 301 LLKTKLPTEMP-PTLAVENLLASLLHDKKVKAGKVRFILPTAIGQVTIS 348 

60 An alignment of the GAS and GBS proteins is shown below: 

Identities = 121/332 (36%) , Positives = 182/332 (54%) , Gaps = 7/332 (2%) 



Query: 12 YHIKIEEGCFSEAGDWSHLWQKQMITIITDSNVEILYGESLWQLKKC^FTraVFSFAA 71 
Y I + D + Q++++ ITD V LY ++L + +Q + V 
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Sb j ct : 




YDILFTDHVLKTI^CLGERKQRKLL-FITDQTVYHLY-QTLFEEFAQQ-YNAFVHVCPP 


70 


Query: 


72 




131 






GFASKTLEW^IYAF^KHHMTRSDGI IALGG^WGD^FTOSTY^GIHF^IPTSL 




Sbjct: 


71 


GGQSKSLERVSAIYDQLIAEWFSKKDMIVTIGGGWGDLGGFVAATYYRGIPYIQIPTTL 


130 


Query: 


132 


TAQVDSSIGGKTGVNTSFAH^'GTFAQPDGVLIDPVTLKTLG^ELVEGMGEVIKYGLI 


191 






+QVDSSIGGK GV-t- NM+G+ P+ ++I Ii+TIi RE G+ E++K G I 




Sb j ct : 


131 


LSQVDSSIGGKVGVHFKGLTNMIGS1YPPEAIIISTTFLETLPQREFSCGISEMLKIGFI 


190 














D L4- L + D +IY S K++ V D+++ GLRM LNFGHT+GHA 




Sbjct- 


191 


HDRPLFQQLRDFQKETDK--QGLERLIYQ3ISNKKRIVEQDEFENGLRMSLNFGHTLGHA 


248 


Query: 


252 


IEVHAGYGEIMHGEAVAIGMIQLSRVAERKNLMPRGISQDIYNMCLKFGLP--VHYAEWD 


309 






IE + HGEA+AIGM+ +++A K L+P+ + + ++ LP + A+ 




Sbjct: 


249 


IESLCHHDFYHHGEAIAIGMWDAKLAVSKGLLPKEDLDSLLQVFERYQLPTTLERADVS 


308 


Query: 


310 


KDVLFDILSHDKKASGQFIKIVILPQLGSATV 341 








LFD+ DKK S Q I ++ + G T+ 




Sbjct: 


309 


ATSLFDVFKTDKKNSEQHI1FILPTETGFTTL 340 





SEQ ID 1886 (GBS336) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 62 (lane 2; MW 42.7kDa). It was also expressed in E.coli as a GST-fusion 
25 product. SDS-PAGE analysis of total cell extract is shown in Figure 67 (lane 5; MW 68kDa). 

The GBS336-GST fusion product was purified (Figure 209, lane 4) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 310), which confirmed that the protein is immunoacccssible 
on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
30 vaccines or diagnostics. 

Example 609 

A DNA sequence (GBSx0649) was identified in S.agalactiae <SEQ ID 1889> which encodes the amino 
acid sequence <SEQ ID 1890>. Analysis of this protein sequence reveals the following: 

Possible site: 47 
35 »> Seems to have no N-terrainal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3884 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

40 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9973> which encodes amino acid sequence <SEQ ID 9974> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

45 >GP:CAB14240 GB:Z99116 3-dehydroquinate dehydratase [Bacillus subtilis] 

Identities = 70/233 (30%) , Positives = 127/233 (54%) , Gaps = 12/233 (5%) 

Query: 2 KIWPVMPRSLEEA-QEIDLSKFDSVDIIEWRADALPK DDIINVAPAIFEKFAGHE 56 

KI++P+M ++ ++ E + K + DI+EWR D K + + + + + 
50 Sbjct: 17 KIIIPLMGKTEKQIimAmVKLMPDIvETOvDVFEKANTOEAVTKLISKLRKSLEDKL 76 

Query: 57 IIFTLRTTREGGNIVLSDAEYVELIQKINSIYNPDYIDFEYFSHKEVFQEMLEFPN 112 

+FT RT +EGG++ + ++ Y+ L++ + D ID E FS + ++ 

Sbjct- 77 FLFTFRTHKEGGS^MDESSYLALLESAIQTKDIDLIDIELFSGDAOTKALVSLAEENNV 136 

55 
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Query: 113 -LVLSYHNFQETP- -ENItffilFSELTJUjAPRWKIAVMPXlffiQDVliDVMNVTRGFKTINP 169 

+V+S H+F++TP +1+ ++ L + K+AVMP + D+L +++ T KIT 
Sbjct: 13 7 yWMSOTDFEKTPVKDEIISRLRKMQDLGMClPKMAVMPMDTGDLIjTLIiDATYTMKTiyA 196 

Query: 170 DQVYAWSMSKIGRISRFAGDVTGSSWTFAYLDSSIAPGQITISEMKRVKALL 222 

D+ T+SM+ G ISR +G+V GS+ TF + + APGQI +SE++ V +L 
Sbjct: 197 DRPIITMSMAATGLISRLSGEVFGSACTFGAGEEASAPGQIPVSELRSVLDIL 249 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1891> which encodes the amino acid 
sequence <SEQ ID 1892>. Analysis of this protein sequence reveals the following: 

o N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3248 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 160/225 (71%) , Positives = 198/225 (87%) 
Sbjct: 

Query: 61 LRTTREGGNIVLSDAEyVELIQKlNSIYNPDYIDFEYFSHKEVFQEMLEFPNLVLSYHNF 120 

LRT +EGGNI LS EYV++I++IN+IYNPDYIDFEYF+HK VFQEML+FPNL+LSYHNF 
Sbjct i 61 LRTVQEGGNITLSSQEYVDIIJCEINAIYNPDYIDFEYFTHKSVFQEMIiDFPNLILSYHNF 120 

Query: 121 QETPENIMEIFSELTALAPRWKIAVMPKNEQDVLDVMN-fTRGFKTINPDQVYATVSMSK 180 

+ETPEN+ME FSE+T LAPRWK1AVMP+ +EQDVLD+MNYTRGFKT+NP+Q +AT+SM K 
Sbjct: 121 EETPENIJIE^SEMTKLAPRVvICIAVMPQSEQDVI^IMm'RGFKTIiNPEQEFATISMGK 180 

Query: 181 IGRISRFAGDVTGSSWTFAYLDSSIAPGQITISEMKRVKALLDAD 225 

+GR+SRFAGDV GSSWT+ LD PGQ+T+++MKR+ +L+ D 
Sbjct: 181 LGRLSRFAGDVIGSSWTYVSLDHVSGPGQVTIiNDMKRIIEVLEMD 225 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 610 

A DNA sequence (GBSx0650) was identified in S.agalactiae <SEQ ID 1893> which encodes the amino 
acid sequence <SEQ ID 1894>. Analysis of this protein sequence reveals the following: 



Final Results 

bacterial cytoplasm Certainty=0 . 1195 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 611 

A DNA sequence (GBSx0651) was identified in S.agalactiae <SEQ ID 1895> which encodes the amino 
acid sequence <SEQ ID 1896>. Analysis of this protein sequence reveals the following: 

Possible site: 41 
5 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 3431 (Affirmative) < suco 
bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 
10 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB15862 GB:Z99123 alternate gene name: ipa-19d~similar to 
hypothetical proteins [Bacillus subtilis] 
15 Identities = 161/396 (40%), Positives = 235/396 (58%), Gaps = 11/396 (2%) 



Query: 1 MNKLKVNSWERKIKSGAQLLEKKDFDTSLVNQ LVQLFSQSN-QFLGMAYLSPQNK 55 

M L + KIK G L+EK+ S + LV + S+S +FL Y QNK 

Sbjct: 1 MKLLTLKKAHIU^IKKGYPLIEKEALAGSAGHMKEGDLVDIVSESGGEFLARGYYGLQNK 60 

Query: 56 GIGWLLSRQVFD-FNHDYFVSLFEKSREKRQKFEKSSQTTAYRLFNQDGDNFGGLTIDFY 114 

G+GW L+R + + +F+S K+ + R K ++ TTA+RLFN +GD GG+TID+Y 
Sbjct: 61 GVGWTLTRNKHEQIDQAFFLSKLTKAAQARAKLFEAQDTTAFRLFNGEGDGVGGVTIDYY 120 

Query: 115 SDYALFSWYNEFVYTNRQMIVAAFKQVYPNIKGAYEKIRFKGLDF---ESAHLYGQEAPE 171 

Y L WY++ +YT + M+++A ++ + K YEK RF + + G+ 

SbjCt: 121 DGYLLIQWYSKGIYTFKDMLISALDEMDLDYKAIYEKKRFDTAGQYVEDDDFVKGRRGEF 180 

Query: 172 SFLILENNIKYSVFLNDGLMTGIFLDQHDVRKALATNLSEGKKOTiNMFSYTAAFSVAAAV 231 

+1 EN I+Y+V LN+G MTGIFLDQ VRKA+ ++GK VUJ FSYT AFSVAAA+ 
Sbjct: 181 PIIIQENGIQYAVDLNEGAMTGIFLDQRHVRKAIRDRYAKBI 

Query: 232 GGALETTSVDLAKRSREI.SKAHFDAMQIVTDNI 

GGA +TTSVD+A RS + F N++ + H VMDVF Y+ YA +K L +D+I++D 
Sbjct: 241 GGAEKTTSVDVANRSLAKTIEQFSVNKmYEMDIK^^VFI^FSYAAKKDLRFDrillLD 300 



Query: 292 PPSFARNKKQTFSVTKDYYKLIEQALDILTPGGTIIASTNAANLTVSQFKKQLEKGFGKA 351 

PPSFAR KK+TFS KDY L+++ + 1 G I+ASTO++ + +FK ++ F + 
Sbjct: 301 PPSFARTKKRTFSAAKDYKWLLKETIAITADKGV1VASTNSSAFGMKKFKGFIDAAFKET 360 

Query: 352 SHNYISLQQ- -LPEDFTINDKDQQSNYLKVFTIKVK 385 

+ Y +++ LPEDF + NYLKV ++ K 

Sbjct: 361 NERYTIIEEFTL.PEDFKTISAFPEGNYLKWLLQKK 396 



45 A related DNA sequence was identified in S.pyogenes <SEQ ID 1897> which encodes the amino acid 
sequence <SEQ ID 1898>. Analysis of this protein sequence reveals the following: 

Possible site: 29 

»> Seems to have no N-terminal signal sequence 

50 Final Results 

bacterial cytoplasm — Certainty=0 .2699 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 



55 An alignment of the GAS and GBS proteins is shown below: 

Identities = 259/386 (67%) , Positives = 315/386 (81%) , Gaps = 1/386 (0%) 

Query: 1 MKKLKVNSVVERKIKSGAQLIiEKKDFDT-SLVNQLVQLFSQSNQFLGMAYLSPQNKGIGW 59 
MNKL ++S VE+K+ +G QLL++KDF NQLVQL ++SN+ +G AY+S QNKGIGW 

60 Sbjct: 1 MNKLYIDSFVEKKLTAGVQLLDEKDFSKIKEKl^QLVQLVTKSNRPIGTAYISKQNKGIGW 60 
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Query: 60 LLSRQVFDFNHDYFVSLFEKSREKRQKPEKSSQTTAYRLFNQDGDNFGGLTIDFYSDYAL 119 

L + D + YFVSLF ++ KRQ F +S +T AYRLFNQ+GD FGG+TID Y D+A+ 
Sbjct: 61 YLGPEKIDLSISYFVSLFSVAKAKRQDFAQSDETNAYRLFNQSGDGFGGVTIDLYKDFAV 120 

Query: 120 FSWYNEFVYTNRQMIVAAFKQVYPNIKGAYEKIRFKGLDFESAHLYGQEAPESFLILENN 179 

FSWYN FVY ++MI+ AF+QV+P +KGAYER RFKG D E+AHLYG+ A E+F ILEN 
Sbjct: 121 FSWYNAFVYDKKEMIMEAFQQVFPEVKGAYEKCRFKGPDTETAHLYGELAQETFSILENG 180 

Query: 180 IKYSVFIOTGLMTGIFLDQHDWKAIATNLSEGKKVLNI'IFSYTAAFSVAAAVGGALETTS 239 

I Y VFLN+GLMTGIFLDQHDVR+AL L+ GK +LN+FSYTAAFSVAAA+GGA+ETTS 
Sbjct: 181 IAYQVFLHEGLMTGIFLDQHDWRALVDGLAMGKSLLNLFSYTAAFSVAAAMGGAIETTS 240 

Query: 240 VDLAKRSRELSKAHFDANQIVTDNHRFIVMDVFEYYKYAKRKHLSYDVrVIDPPSFARWK 299 

VDLAKRSRELS AHF+ NQ+ +H F+VMDVFEY+KYAKRK L +DVIVIDPPSFARNK 
Sbjct: 241 VDIjAKRSRELSLAHFEHNQIiMrASHHFVVMDV?EYFKYAKRKXLIFDV'IVIDPPSFARNK 300 

Query: 3 00 KQTFSVTKDYYKLIEQAUDILTPGGTIIASTNAANLTVSQFKKQLEKGFGKASHNYISLQ 359 

KQTFSV++DY+KLI +ALDIL+P GTIIASTNAAN+TVSQFKKQ+ KGFG ++LQ 
Sbjct: 301 KQTFSVSRDYHKLITEALDILSPKGTIIASTNAAW4TVSQFKKQIIKGFGSRRPESMTLQ 360 

Query: 360 QLPEDFTINDKDQQSNYLKVFTIKVK 385 

QLP DFTIN D++SNYLKVFTIKV+ 
Sbjct: 361 QLPSDFTINKADERSNYLKVFTIKVR 386 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 612 

A DNA sequence (GBSx0652) was identified in S.agalactiae <SEQ ID 1899> which encodes the amino 
acid sequence <SEQ ID 1900>. This protein is predicted to be minimal change nephritis transmembrane 
glycoprotein. Analysis of this protein sequence reveals the following: 
Possible site: 30 





have an uncleavable N- 


term signal seq 








INTEGRAL 


Likelihood = -6.85 


Transmembrane 129 


145 


126 


152 


INTEGRAL 


Likelihood = -4.88 


Transmembrane 48 


64 


46 


69) 


INTEGRAL 


Likelihood = -4.83 


Transmembrane 75 


91 


74 


97 


INTEGRAL 


Likelihood = -4.S2 


Transmembrane 16 


32 


15 




INTEGRAL 


Likelihood = -2.28 


Transmembrane 163 


179 


163 


182 



Final Results 

bacterial membrane Certainty=0 . 3739 (Affirmative) < suco 

bacterial outside — - Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB12545 GB:Z99107 alternate gene name: yetP-similar to 
hypothetical proteins [Bacillus subtilis] 
Identities = 299/676 (44%) , Positives = 415/676 (61%) , Gaps = 33/676 (4%) 

Query: 2 KKIKDFASRAINTRLGFILLLWIYWLKTIWAYHTDFNLGLENSYQLFLTIINPIPLGLL 61 

KK+4 + + +L F 4L V+++W KT +Y T+FNLG++ 4 Q L I NP + 
Sbjct: 9 KKVEVAMKKLFSYKLSFFVLAVILFWAKTYLSYKTEFI^GVKGTTQEILLIFNPFSSAVF 68 

Query: 62 IIGIALWKRTKAFYITAFITYAIWILLIANAIYYREFSDFITVSAVLASSKTSAGLGD 121 

+GLAL K K+ I I + ++ +L AN ++YR F DF+T + S +GD 
Sbjct: 69 FLGLALLAKGRKSAIIMLIIDF-LMTFVLYANILFYRFFDDFLTFPNIKQSGNVG-NMGD 126 

Query: 122 SALNLLRIWDLVYVFDFIILIFLFATKKIHLDDRPFNKRASFSITALSGL-LFSINLFLA 180 

+++ D+ Y D IILI + + L + KR + S+ LSG+ LF INL A 
Sbjct: 127 GIFS IMAGHDI FYFLDI IILIAVLIWRP -ELKEYKMKKRFA- SLVILSGIALFFINLHYA 184 

Query: 181 EIDRPELLSRGFSNTYIVKALGLPSFSIYSGNQTYQAQKERNGATAQELATAKKYVAEHY 240 
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Sbjct: 


185 


Query: 


241 


Sbjct: 


245 


Query: 


300 


Sbjct: 


301 


Query: 


3S0 


Sbj ct: 


360 


Query: 


420 


Sbjct: 


419 


Query: 


480 


Sbjct: 


476 


Query: 


540 


Sbjct: 


532 


Query: 


600 


Sbjct: 


591 




660 


Sbjct: 


646 



3 DRP+LL+R F YIVK LGL +++IY G QT Q + +R A++ +L + 



AKPN EY+G KG+N+I IHLESFQ FLIDYKLN G+E VTPF+N L H E V 



NFFHQ GKTSDAE M+NS+FGL GS V GENT + P Xh Q GY+SAV HG 



+FWNR+ YK GYD FFD+S + + +N GL DK F +SI 



H++YGDH GIS N ++ E+LGK+ ++Y NA QRVP MI +PG KG 4 



+TYGGE+D +PTLLH+ GID+ KY G DL SKD+ VA R G ++TPKYT+ 



Y T +G++4 +ET 



A related DNA sequence was identified in S. pyogenes <SEQ ID 1901> which encodes the amino acid 
sequence <SEQ ID 1902>. Analysis of this protein sequence reveals the following: 

Possible site: 48 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -6.85 Transmembrane 90 - 106 
INTEGRAL Likelihood = -5.68 Transmembrane 146 - 162 ( 139 - 165) 
INTEGRAL Likelihood = -4.99 Transmembrane S3 - 79 
INTEGRAL Likelihood = -3.98 Transmembrane 178 - 194 
INTEGRAL Likelihood = -0.59 



Final Results 

bacterial membrane Certainty=0. 3739 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 533/713 (74%), Positives = 603/713 (83%) 

Query: 1 MKKIKDFASRAINTRLGFILLLWIYWLKTIWRYHTDFNLGLENSYQLFLTIINPIPLGL 60 

+KK K + INTRLGFI+ L-l- YW+KT+WAYHTDF+L L N YQ+FLTIINPIPL 
Sbjct: 15 VKKFKTLITGFINTRLGFIITLLFCYWIKTLWAYHTDFSLDLGNIYQVFLTIINPIPLAF 75 

Query: 61 LIIGLALYVKRTKAFYITAFITYAIVNIIilAI^IYYREFSDFITVSAVLASSKTSAGLG 120 

L++G+ALYVK T+AFYI +++ Y I+NILLI+N+IYYREFSDFITVSA+LASSK SAGLG 
Sbjct: 76 LLLGVALYvTOTRAFYICSWVvYIIIiNILLISNSIYYREFSDFITVSAMLASSKVSAGLG 135 

Query: 121 DSAIMLRIWDLvYVFDFIILIFLFATKKIHLDDRPFNKRASFSITALSGLLFSINLFLA 180 

DSALNLLRIWD++Y+ DFIILI L KKI D RPFNKRA+F+ITALS LL SINLFLA 
Sbjct: 136 DSAimLRIWDIIYILDFIILISLSIAKKIiCOTQRFratCPAAFAITALSSLLLSINLFLA 195 
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Query: 181 EIDRPELLSRGFSNTYIVKZaGLPSFS-YSGNQTYQAQKERNGATAQEIATAKKYVAEHY 240 

EIDRPELL+RGFSNTYIV+ALGLP+ F+ +YSGNQTYQAQKERNGATA+EL K YV HY 
Sbjct: 196 EIDRPELLTRGFSNTYIVRALGLPAFTLYSGNQTYQAQKERNGATAEELIDVKTWKGHY 255 

Query: 241 AKPNPEYYGIGKGRNVIMIHIiESFQQFIiIDYKIJSriDGKEHVVTPFINSLYHSKETVSFSN 300 

A P+P+Y+GIGKG+N+I++HLESFQQFLIDYKL KE+ VTPFINSLYHS T++F N 
Sbjct: 256 AAPDPQYFGIGKGKNIIVLHLESFQQFLIDYKLKEGDKEYEVTPFIMSIiYHSWATLAFPN 315 

Query: 301 FFHQWAGKTSDAETLMENSLFGLSSGSFMVHYGGENTQFAAPHILAQNGGYSSAVFHGN 360 

FFHQVKAGKTSDAET+MENSLFGLh-SGSF^TOGENTQFA p ilaq ggy+savfhgn 
Sbjct: 316 FFHQVKAGICTSDAETMMENSLFGIJ9SGSFMVNYGGENTQFATPSILAQKGGYTSAVFHGN 375 

Query: 3 61 VGTFrosfRjOTAYKQWGYDYFFDSSYFSKQTFCDNSFQYGLNDKYMFADSIJCYLEHMQQPFYT 420 

VGTFWNRNNAYKQWGY+YFFDSSYFSKQ NSFQYGLNDKYMF DSIKYLE MQQPFYT 
Sbjct: 376 VGTFWNR1TOAYKQWGYNYFFDSSYFSKQNSKNSFQYGLNDKYMFKDSIKYLEQMQQPFYT 435 



Query: 421 KFITVSNHYPYTSLKGESDEEGFPLAKTNDETINGYFATANYLDTALKSFFEYLKAAGVY 480 

KFITVSNHYPYTSLKGES EEGFPLAKT+DETINGYFATANYLD ALKSFF+YLKA G+Y 
Sbjct: 436 KFITVSNHYPYTSLKGESSEEGFPLAKTDDETINGYFATANYLDAALKSFFDYLKATGLY 495 

Query: 481 DNSIIVMYGDHYGISNTRNPSLAELLGKDPE7WSEYDNAMLQRVPYMIHIPGYSKGF1SN 540 

DNS1 V+YGDHYGISN+RN SLA LLGKD ETWSEYDNAMLQRVPYMIHIPGY+ G 1 
Sbjct: 496 DNSIEVLYGDHYGISNSRNSSLAPLLGKDSETWSEYDNAMLQRVPYMIHIPGYTNGSIKE 555 

Query: 541 TYGGEVDNLPTLLHILGIDTSKYTQLGQDLLSKDNKQMVAMRTTGQYITPKYTOYSGHLY 600 

T+GGE+D LPTLLHILGIDTS++ QLGQDLLS N Q+VA RT+G Y+TP+YTNYSG LY 
Sbjct: 556 TFGGEIDALPTLLHILGIDTSQFVQLGQDLLSPQNSQIVAQRTSGTYMTPEYTNYSGRLY 615 

Query: 601 YTDSGQE ITNPDETTKAE I KAIRDATNKQLSTSDS I QTGDLLRFDENNGLKTVEVEKFNY 660 

T +G EITNPDE T A+ K IR A +QL+ SD+IQTGDLLRFD NGLK 4+ +F Y 
Sbjct: 616 NTQTGLEITNPDEMTIAKTKEIRSAVAQQLAASDAIQTGDLLRFDTQNGLKAIDPNQFIY 675 

Query: 661 THSLKALKAKERKLKDRSTSIYSKHNNKSTVDLFHAPSYLELQDPNKTHKTSK 713 

T 1KLK KL STS+YSK+ +KST LF APSYLEL TS+ 
Sbjct: 676 TKQLKQLKDISAKLGSEST3LYSKNGHKSTQKLFKAPSYLELNPVEADAATSE 728 



A related GBS gene <SEQ ID 8619> and protein <SEQ ID 8620> were also identified. Analysis of this 
protein sequence reveals the following: 



Lipop Possible site: -1 Crend: 9 



McG: Discrim . 


core : 12 . 63 










GvH: Signal Sc 


ore (-7.5): -2.99 










Possible 


site: 30 










»> Seems to have an uncleavable N-term signal seq 










ALOM program 


count: 5 value: -6.85 threshold: 0.0 








INTEGRAL 


Likelihood = -6.85 Transmembrane 


129 


145 


126 


152) 


INTEGRAL 


Likelihood = -4.88 Transmembrane 


48 


64 


46 


69) 


INTEGRAL 


Likelihood = -4.83 Transmembrane 


75 


91 


74 


97) 


INTEGRAL 


Likelihood = -4.62 Transmembrane 


16 


32 


15 


34) 


INTEGRAL 


Likelihood = -2.28 Transmembrane 


163 


179 


163 


182) 


PERIPHERAL 


Likelihood = 3.76 103 











modified ALOM score: 1.87 



* Reasoning Step: 



■ Final Results 

bacterial membrane — Certainty=0. 373 9 (Affirmative) < 

bacterial outside Certainty=0. 0000 (Not Clear) < ! 

bacterial cytoplasm — - Certainty=0 . 0000 (Not Clear) < £ 



60 The protein has homology with the following sequences in the databases: 



Bacillus subtilis 

EGAD | 107893 | hypothetical protein Insert characterized 
GP|2U6767|dbj |BAA20118.l| |D8641B Yfnl Insert characterized 
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GP|2633039|etnb|CAB12545.l| |Z99107 alternate gene name: yetP-similar to hypothetical 
proteins Insert characterized 

PIR|D69815|D69B15 conserved hypothetical protein yfnl - Insert characterized 

5 ORF00125(286 - 2280 of 2742) 

EGAD]l078l|| S0726(3 - 64S of 653) hypothetical protein { acillus subtilis} GP| 2116767 | dbj | 
AA20118.l| |D86418 Yfnl { acillus subtilis} GP | 2633039 | emb | CA 12545 . 1 || Z99107 alternate gene 
name: yetP-similar to hypothetical proteins { acillus subtilis} PIR]D69815|D69815 conserved 
hypothetical protein yfnl - acillus subtilis 
10 %Match =28.5 

%Identity =45.1 %Similarity =63.1 

Matches = 297 Mismatches = 227 Conservative Sub.s = 118 

36 66 96 126 156 186 216 246 

1 5 FVvCTRPSLRIDLTVKKVEPTG*LNWYQNLFFPVTEHjI * FFFQRQNSL* VYS*TVL*QIFIFFHTEFDLSLPyVTKFYV 

276 306 336 366 396 426 456 486 

ii*seilslgkklkevptvkkikdfaspaiot , rlgfilllw:™lktiwayhtdfnlglensyqlfltiinpiplglli 

:|: ||:: : : :| | :| |:::| || :| |:||||:: : | | I II = = 

20 mneelkvfkkvevamkklfsyklsffviavilfkaktylsyktefnlgvkgttqeillifnpfssavff 



516 546 576 606 636 666 696 726 

IGLALYVIu^TKAFYITAFITYAIWILLIANAIYYREFSDFITVSAVIASSKTC 
25 :|||| I |: I I : :: s| || ::|| | ||:| : | :|| :::: |: | :| |||| 

LGLALLAKGRKSAI IMLI IDF- LMTFVLYAHILFYRFFDDFLTFPNr- KQSGNVGNMGDGIFSIMAGHDI FYFLDI I ILI 
80 90 100 110 120 130 140 



756 786 843 873 903 933 963 

30 FLFATKKIHLDDRPFNKRASFSITALSGL-LFSINLFIAEIDRPELLSRGFSNTYIVKALGLPSFSIYSGNQTYQAQKER 
: I = II h 111 = II III II 111 = 11 = 1 I llll III = = = ll I I I I = ! I 
-AVLIWRPELI{EYI<MKI<R-FASLVILSGIALFFINLHYAEI<DR?QLLTRTFDRNYIVI<YLGLYKYTIYDGVQTAQTETQR 
160 170 180 190 200 210 220 

35 993 1023 1053 1083 1113 1143 1173 1200 

NGATAQEIATAKKYVAEHYAKPNPEYYGIGKGRIWIMIH^ 

I--: =1 : = I MIMI 11 = 1 11 = 1 = 1 lllllll llllllll 1111 = 1 II 11=11 

AYASSDDLTSVENYTTSHYAKPNAEYFGSAKGKNIIKIHLESFQSFLIDYKLN GEEVTPFLNKLAHGGEDVTYFDN 

240 250 260 270 280 290 300 

40 

1230 1260 1290 1320 1350 1380 1410 1440 

FFHQVKAGKTSDAETLMENSLFGLSSGSF^ra^GGEnTOFAAPHIIAQNGGYSSAVFHG^^VGTFWNRl^AYKQWGYDYFF 
llll lllllll 1=11=111 II I llll =1111 11=111=11= =1111= 11= III II 
FFHQTGQGKTSDAELTMDNSIFGLPEGSAFVT-KGENTYQSLPAILDQKEGYTSAVLHGDYKSFWNRDQIYKHIGYDKFF 
45 320 330 340 350 360 370 380 

1470 1500 1530 1560 1590 1620 1650 1680 

DSSYFSKQTE<DNSFQYGLlTOKYMFADSIKYLEHMQQPFYTKFIWSNKY?YTSIjKGESDEEGFPLAKTNDETINGYFATA 
1=1 = =1 II II I =11 II ==1111 =ll==llll= =111 = I I I l== II II 

50 DAS - TYDMSDENVIHMGLKDKPFFTES I PKLESLKQPFYAHLI TLTNHYPF -NLD - EKDAS - LKKATTGDNTVDSYFQTA 

390 400 410 420 430 440 450 

1710 1740 1770 1800 1830 1860 1890 1920 

NYLDTALKSFFEYLKAAGVYDNSIIVMYGDHYGISOTFJ^PSIAELLGKDPETWSEYDNAMLQRVPYMIHIPGYSKGFISN 
55 III ||. ||. || ||:||||:|=:|||| III | := |=||| >:| II III! II >|| || i>: 

RYLDEALEQFFKELKEAGLYDNSVIMIYGDHNGISENHNRAMKEILGK EITDYQNAQNQRVPLMIRVPG-KKGGVNH 

470 480 490 500 510 520 530 

1950 1980 2010 2040 2070 2100 2130 2160 

60 TYGGEVDWLPTLLHILGIDTSKYTQLGQDLLSKDNKQMVAMRTTGQYITPKYTNYSGHLYYTDSGQEITNPDETTKAEIK 
IUII.-I =11111= 111= II =1 11 = 111= II I I = = 11111= =1 I =1 = = = I I 

TYGGEIDVMPTLLHLEGIDSQKYINFGTDLFSKDHDDTVAFR-NGDFVTPKYTSVDNIIYDTKTGEKL KANEETK 

550 560 570 580 590 600 

65 2190 2220 2250 2280 2310 2340 2370 2400 

MPJ3ATNKQLSTSDSIQTGDLLRFDEIWGLKTVEVE 

:: 1=111 111= lllll =1=11= ==l 
NLKTRVNQQLSLSDSVLYKDLLRFHICLNDFKAVDPSDYHYQKEKEIK 
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620 630 640 650 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

5 Example 613 

A DNA sequence (GBSx0653) was identified in S.agalactiae <SEQ ID 1903> which encodes the amino 
acid sequence <SEQ ID 1904>. This protein is predicted to be 50S ribosomal protein L20 (rplT). Analysis 
of this protein sequence reveals the following: 

Possible site: 37 
10 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3392 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 
15 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9387> which encodes amino acid sequence <SEQ ID 9388> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

20 >GP:CAB1484S GB:Z99118 ribosomal protein L20 [Bacillus subtilis] 

Identities = 70/89 (78%) , Positives = 78/89 (86%) 

Query: 1 MFRTAKEQVMNSYYYAYRDRRQKKRDFRKLWIT^^ 60 
+++ A +QVM S YA+RDRRQKKRDFRKLWITRINAAARMNGLSYS+LMHGLKL+ IEV 
25 ' Sbjct: 31 LYKVANQQVMKSGNYAFRDRRQKKRDFRKLWITRINAAARMNGLSYSRLMHGLKLSGIEV 90 

Query: 61 NRKMLADIiAVNDAAAFTALADAAKAKLGK 89 

NRKMLADLAVND AF LADAAKA+L K 
Sbjct: 91 NRKMLADLAVNDLTAFNQIiADAAKAQLNK 119 

30 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1905> which encodes the amino acid 
sequence <SEQ ID 1906>. Analysis of this protein sequence reveals the following: 

Possible site: 27 

>» Seems to have no N-terminal signal sequence 
35 INTEGRAL Likelihood = -0.06 Transmembrane 94 - 110 ( 94 - 110) 

Final Results 

bacterial membrane --- Certainty=0 . 1022 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

40 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 87/89 (97%), Positives = 88/89 (98%) 

45 Query: 1 MFRTAKEQVMNSYYYAYRDRRQKKRDFRKLWITRINAAARMNGLSYSQLMHGLKLAEIEV 60 

+FRTAKEQVT^SYYYAYRDRRQKKRDFRKLWITRINAAAR^GLSYSQL.WGLKLAEIEV 
Sbjct: 31 LFRTAKEQVMNSYYYAYRDRRQKKRDFRKLTOTRINAAARMNGLSYSQLMHGLKLAEIEV 90 

Query: 61 NRKMLADLAVNDAAAFTALADAAKAKLGK 89 
50 ' NRKMLADIiAV DAAAFTALADAAKAKLGK 

Sbjct: 91 NRKMLADLAVADAAAFTALADAAKAKLGK 119 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 614 

A DNA sequence (GBSx0654) was identified in S.agalactiae <SEQ ID 1907> which encodes the amino 
acid sequence <SEQ ID 1908>. Analysis of this protein sequence reveals the following: 

Possible site: 21 
5 >>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -0.64 Transmembrane 32 - 48 ( 32 - 48) 
INTEGRAL Likelihood = -0.32 Transmembrane 3 - 19 ( 3-19) 

Final Results 

10 bacterial membrane Certainty=0 . 125S (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

15 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 



Example 615 

A DNA sequence (GBSx0655) was identified in S.agalactiae <SEQ ID 1909> which encodes the amino 
20 acid sequence <SEQ ID 1910>. Analysis of this protein sequence reveals the following: 

Possible site: 33 

»> Seems to have a cleavable N-term signal seq. 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood =-12 
Likelihood =-12 
Likelihood =-: 
Likelihood = - 
Likelihood = • 
Likelihood = - 
Likelihood = - 



747 - 763 ( 743 - 772) 

840 - 856 ( 835 - 856) 

447 - 463 ( 440 - 466) 

351 - 367 ( 346 - 372) 

517 - 533 ( 516 - 537) 

397 - 413 ( 395 - 413) 

Transmembrane 799 - 815 ( 799 - 817) 



■ Final Results 

bacterial membrane Certainty=0 . 6052 (Affirmative) <: 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < s 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < s 



A related GBS nucleic acid sequence <SEQ ID 9349> which encodes amino acid sequence <SEQ ID 9350> 
was also identified. 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB89436 GB:AE000977 A. fulgidus predicted coding region AF1820 
[Archaeoglobus fulgidus] 
Identities = 100/483 (20%) , Positives = 210/483 (42%) , Gaps = 61/483 (12%) 

Query: 351 LFPIILYLVAALVTLTTMTRFVEEERTNAGILKALGYSDRQVIFKFIIYGFIAGTLGTTL 410 

LFP LV+ +T ++R + N +++ALG++ +++■ ++ Y + G +T 
Sbjct: 276 LFPAFFILVSIFMTYALLSRIFRLQLGNIAVMRALGFTRNEIMLHYLQYPLLMGFFASTA 335 

Query: 411 GIIGGHYLLPRIISDIISKDLTIPNTQYHLFLNYSLLAFVFSLLSIVLPVFVI 463 

G++ G + + S 1+ L +P L L+ + L+ + F++ 

Sbjct: 336 GLVAGFFASQLLTSQYIT-FLNLPYTi/SKPHLEvYSLSLMAGTLTPTISGFLVAYQASRV 394 

Query: 464 TRRELKEKAAFLLLPKPPAKGSKIALEYI1SMIWKKLSFTQKVTARNIFRYKQRMIM 519 

R E AA + + A S+I W ++ ++ RNIFR K+R + 
Sbjct: 395 DIVKALRGYAEVAAVSFIARIDALFSRI W RMRLIFRLALRNIFRSKRRTAI 445 



55 Query: 520 TIFGVAGSVALLFSGLGIQSSLKQTVNEHFGRIMPYDILLTYNTNASPPKILELLSKDSK 579 
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Sbjct 

Query: 580 IDKY QP I HLENLDES IPGQINKQS ISLFITDKKQLLPFI YLQEATTNKSLHL 631 

+D PI++E E++P +L I Q L +Y E + 
Sbjct: 502 MDGVLFAEPAVEMPIYVEKGGEAVP TLLIASNFQTLYNVYNAEG EKLI 549 

Query: 632 NNKGIIISKKIAQFYHVNTGDFIHL SHSQTLPSRKLKITGVWANVGHYIFMTK 685 

++GII SK + + G+ + + ++ + + V A++ 
Sbjct: 550 PSEGIIFSKTAMKNLSLVEGEKVSVYTEFGKLEaEVEDVEMIPiLSVATASL 601 

Query: 686 QYYRTIFKIGlAKDNAFLVKLTKHKIANNLiAEKLL 745 

Y+ I + N +V + H-IA +AEK+ +++GV+ ++ S+E ++ 

Sbjct: 602 DYFSRISGVDG-FNRIWDADEGRIA-EIAEKIRQtCIGWKVSTVIEAQESIEELMGFFY 659 

Query: 746 GSMTILWVSLLLAIVILYNLTNINLAERKRELSTIKVLGFYNEEVTLYIYRETIILSTI 805 

+ + + L ++N T4I++ ER REL+T+++LG+ + E+ + + E + ++ + 
Sbjct: 660 AFIAFSLFFGVSLGFAAVFNTTSISVIERSREIAT1RMLGYTSREIIISLILENLFVAIL 719 

Query: 806 GVI 808 

Sbjct: 720 GLV 722 



A related DNA sequence was identified in S. pyogenes <SEQ ID 191 1> which encodes the amino acid 
25 sequence <SEQ ID 1912>. Analysis of this protein sequence reveals the following: 



Possible site: 34 



35 



» Seems to 


have no N-terminal signal sequence 










INTEGRAL 


Likelihood = 


14 


33 


Transmembrane 


749 


765 


739 


775 


INTEGRAL 


Likelihood = 


1C 


88 


Transmembrane 


845 


861 


834 


865) 


INTEGRAL 


Likelihood = 


-6 


64 


Transmembrane 


350 


366 


344 


369 


INTEGRAL 


Likelihood = 


-6 


53 


Transmembrane 


22 


38 


19 


42 


INTEGRAL 


Likelihood = 




32 


Transmembrane 


520 


536 


515 


537 


INTEGRAL 


Likelihood = 






Transmembrane 


446 


462 


445 


465 


INTEGRAL 


Likelihood = 


-2 


92 


Transmembrane 






395 


413 


INTEGRAL 


Likelihood = 


-0 


80 


Transmembrane 


800 


816 







Final Results 

bacterial membrane Certainty=0 . 6731 (Affirmative) < succ; 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

40 bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

>GP:AAB89436 GB:AE000977 A. fulgidus predicted coding region AF1820 
[Archaeoglobus fulgidus] 
45 Identities = 101/542 (18%), Positives = 237/542 (43%), Gaps = 42/542 (7%) 



Query: 350 IFPWLYLVAALVAFTTMTRYVDEERTSSGLLKAIGYSNKDISLKFLIYGLLASFLGTTL 409 

+FP LV+ + + ++R + + +++A+G++ +1 L +L Y LL F +T 
Sbjct: 276 LFPAFFILVSIFMTYALLSRIFRLQLGWIAVMRALGFTRNEIMLHYLQZPLLMGFFASTA 335 

Query: 410 GIIGGTYLLSTLISEILTGA LTIGKTHLYSYWFYNGIAYLLAMLSAVLPAYLIVKKE 466 

G++ G + L S+ +T + K HL Y L +S L AY + + 

Sbjot: 336 GLVAGFFASQLLTSQYITFLNLPYYVSKPHLEVYSLSLI-1AGTLTPTISGFLVAYQASRVD 395 

Query: 467 LFLN AAQLLLPKPPSKGAKIWLEHLTFVWKALSFTHKVTIRNIFRYKQRMLMT 519 

+ AA + + + ++IW L F ++ +RNIFR K+R ++ 

Sbjct: 396 IVKALRGYAEVAAVSFIARIDALFSRIWRMRLIF RLALRNIFRSKRRTAIS 446 

Query: 520 IVGVAGSVALLFAGLGIQSSIAKVVEHQFGDLTTYDILAVGSAKATATEQTDLASYLKQE 579 

I + +L+ + S V++ QFG + YDI + L Y +E 

Sbjct: 447 IFSIVACTSLILNSMVFVDSFDYVMQLQFGKvYAYDI KVSLEGYDGKE 494 

Query: 580 PITGYQKVSYASLTLPVKGLP---DEQSISILSSS-ATSLSPYFNLLDSQEQ[CICVPIPTS 635 

4- +K+ P +P +K ++ + A++ +N+ +++ +K IP+ 

Sbjct: 495 VLEKWKMDGvLFAEPAVEMPIYVEKGGEAVPTLLIASNFQTLYNVYNAEGEKL--IPSE 552 
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Query: 636 GVLISEK1ASYYKVKPGDQLVLTDRKGQSYKVTIKQVIDMTOGHYLIMSDTYFKNHFKGL 695 

G++ S+ + G+++ + G+ ++ ++ L+ T ++F + 

Sbjct: 553 GIIFSKTAMKNLSLVEGEKVSVYTEFGK LEAEVEDVEMIPLLSVATASLDYFSRI 607 

Query: 696 EAAPAYLIKVKDKDSKHIKETASDLLTLKAIRaVSQMVNHIKSVQLVVTSLNQVMTLLVF 755 

+ VDD IBA+ + ++VS+ +S++ ++ + 4-F 

Sbjct: 608 SGVDGFNRIWDADEGRIAEIAEKIRQMDGVKICVSTVIEAQESIEELMGFFYAFIAFSLF 667 

Query: 756 LSILLAIVILYNLTTINIAERIRELSTIKVLGFYDQEVTLYIYRETISLSLVGIIjLGIYL 815 

4- L ++N T+I++ ER REL+T+++LG+ +E+ + + E + ++++G++ + + 
Sbjct: 668 FGVSLGFAAVFNTTS ISVIERSRELATLRMLG YTSRE 1 I ISLILENLFVAILGLVFALPI 727 

Query: 816 GKGLHTYIMTMISTGDIQFGVKVDAYVYLVPILVILSLLAVLGIWVNRHLKKVDMIjEALK 875 

+ ++ ++ ++L + +++ + + R + ++D+ + K 

Sbjct: 728 AYSTAYFFFSSFESELYYMPMVIYPRTFAATVLAVFAIILtALLPSARRVSEMDIAKVTK 787 

Query: 876 SI 877 

I 

Sbjct: 788 EI 789 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 377/857 (43%) , Positives = 543/857 (62%) , Gaps = 7/857 (0%) 

Query: 3 KTFWICDIYRSITTSKGRFSSILLLMMLGSFAFIGLKVSAPNMQRTAQNYLAHHHVMDITV 62 

KT WKDI R+I SKGRF S+ LM LGSFA +GLKV+ P+M+RTA YL H VMD+TV 
Sbjct: 4 KTLWKDILRAII<NSKGRFISLFFI,MALGSFALVGLKVTGPDMERTASRYLERHQvMDLTV 63 

Query: 63 FNSWGLDICHDQTVLESLKGSQVEFSYFVDTTPQQNSKSYRLYSNTKTISTFDLVKGRLPL 122 

S + D+ L++LKG+ +E+ + +D + N KS RLYS K +S LVKG P 
Sbjct: 64 LASHQFSQADKQELDTLKGAHLEYGHLLDVSLTSNQKSLRLYSVPKKVSKPVLVEGSWPK 123 

Query: 123 NKSEIALSFQERKKYAIGDKINFKQDKNKLFSNTGPLTIVGFVNSTEIWSKTNLGSSQTG 182 

++++ LS K Y IGD++ L + T +VGF NS+E+WSK+NLGSS TG 

Sbjct: 124 RETDLVLSSSLAKNYQIGDELAVTSPMEGLLTTTH-FQVVGFANSSEvWSKSNLGSSSTG 182 

Query: 183 DGDLDSYGVLDKTAFHSPVYTMARVTFKDLRLINPFSISYKEKVAKYQEKV'SRKLNIHNK 242 

DG L +Y ++ F S + + R+ F LRL N FS Y+++V + Q + L + + 
Sbjct: 183 DGSLYAYAFV1WNVFKS-AFNLLRIRFSHLRLTNAFSKDYQKRVTQNQAHLDNLLKDNGQ 241 

Query: 243 IRYTKTKKESLRKIDEEEKSLLKAQKQINRLDNDSLAMPLSQRQAIQMKIKQDRLSLLKR 302 

RY++ + +LK+4-+++SQ + +I+Q + +L K 

Sbjct: 242 KRYDDLQNQYDLALKNGRAAIiAKETVKLAASEENLTFLEGSALQEAKHQIEQGKQALAKE 301 

Query: 303 TKELLKLRHNTQIMESPQIIVYNRTTFPGGQGYNTFDSSraSTSKISNLFPIILYLVAAL 362 

K+L +++ +E P + YNR+T PGG+GY+T+ +ST S S + N+FP++LYLVAAL 

Sbjct: 302 EKQLEQVQATKDKLEKPSYLTYNRSTLPGGEGYHTYATSTTSISNVGNIFPVVLYLVAAL 361 

Query: 363 VTLTTMTRFVEEERTNAGILKALGYSDRQVIFKFIIYGFIAGTLGTTLGIIGGHYDLPRI 422 

V TTMTR+V+EERT++G+LKA+GYS++ + KF+IYG +A LGTTLGIIGG YLL + 
Sbjct: 362 VAFTTMTRYVDEERTSSGLLKAIGYSNKDISLKFLIYGLLASFLGTTLGIIGGTYLLSTL 421 

Query: 423 ISDIISKDLTIPOTQYHLFI^SLI^FVFSIjLSIVLPVFVITRRELKEKAAFLLLPKPPA 482 

IS+I++ LTI T + + Y+ +A++ ++LS VLP ++I ++EL AA LLLPKPP+ 
Sbjct: 422 ISEILTGALTIGKTHLYSYWFYNGIAYLLAMLSAVLPAYLIVKKELFLNAAQLIiLPKPPS 481 

Query: 483 KGSKIALEYINWIWKKLSFTQKVTARNIFRYKQRMIMTIFGVAGSVALLFSGLGIQSSLK 542 

KG+KI LE++ ++WK LSFT KVT RNIFRYKQRM+MTI GVRGSVALLF+GLGIQSSL 
Sbjct: 482 KGAKIWLEHLTFVWKALSFTHKVTIRNIFRYKQRMLKTIVGVAGSVAI.LFAGLGIQSSLA 541 

Query: 543 QTV^HFGRIMPYDILLTYMTNASPPKILELLS--KDSKIDKYQPIHLENLDESIPGQIN 600 

+ V FG + YD IL + A+ + +L S K I YQ + +L + G + 
Sbjct: 542 KWEHQFGDLTTYDILAVGSAKATATEQTDLASYLKQEPITGYQKVSYASLTLPVKGLPD 601 



: 602 KQSISILSSSATSLSPYEmLDSQEQKKVPIPTSGVLISEKLRSYYKVKPGDQLVLTDRK 661 
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Query: 661 TLPSRKLKITGVVNiWVGHYIFMTKQYYRTIFKXEXXDN 718 

S K+ I V++ VGHY+ M+ Y++ FK A+L+K+ K A L 

Sbjct: 662 G-QSYKVTIKQVIDMWGHYLIMSDTYFKiraFKGLEffiAPAYLIKVKDKDSKHIKETASDL 720 

Query: 719 LEINGVESLTQNALQIASVFAVVRSLDGSMTILVWSLLIAIVILYNLTNINLAERKREL 778 

L + + +++QN + SV+ W SL+ MT+I.V +S+LLAIVILYNLT IN+AER REL 
Sbjct: 721 LTLKAIRAVEQNVNH1KSVQLWTSLNQVMTLLVPLSILLAIVILYNLTTINIAERIREL 780 

Query: 779 STIKVLGFYNEEVTLYIYRETIILSTIGVILGTISGTYLHRQMMLLIGSDQILFGEKVSP 838 

STIKVLGFY++EVTLYIYRETI LS +G++LG G LH +M +1 + I FG KV 
Sbjct: 781 STIKVTLGFYDQEVTLYIYRETISLSLVGILLGIYLGKGLHTYIMTI^ISTGDIQFGVKVDA 840 

Query: 839 TTFIIPISVWI1LXXL 855 

+++PI V++ +L L 
Sbjct: 841 YVYLVPILV1LSLLAVL 857 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 616 

A DNA sequence (GBSx0656) was identified in S.agalactiae <SEQ ID 1913> which encodes the amino 
acid sequence <SEQ ID 1914>. This protein is predicted to be ABC transporter, ATP-binding protein. 
Analysis of this protein sequence reveals the following: 

D N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 2757 (Affirmative) < succi 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < succ> 

bacterial outside — - Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB89431 GB:AE000977 ABC transporter, ATP-binding protein 
[Archaeoglobus fulgidus] 
Identities = 112/230 (48%) , Positives = 167/230 (71%) 

Query: 4 IEMKHSYK^YQTGETEIVAt^NDISFSIERGELWILGASGAGKSTVlNILGGMDSNSEGE 63 

+ ++ +K YQ G+ E+ A 1+ IERGE +V+LG SG GK+T+LNI+GG+D + G 
Sbjct: 2 LRLEDVWKTOQMGKVEVSALRGINLEIERGEFM^'LGPSGCGKTTMLNriGGIDRPTRGR 61 

Query: 64 VLIDGFOTIANYTIRELTRYRRYDVGFVFQFYNLVPl&TALEKVELASEIVPKALDAQQAL 123 

V+ DGK+I NY LT 4RR +VGF+FQF+NL+P LTA ENVE+A+++V D + L 
Sbjct: 62 VIFDGKDITimiEDRLTMHRPJMVGFIFQFFNLIPTLTARENVEIAaDLvBSPRDTOEVL 121 

Query: 124 ENVGLGHRINHFPAQLSGGEQQRVAIARAIAKKPKLBLCDEPTGALDYQTGKQVIAILQK 183 

+ VGL R HFPA+LSGGEQQRVAIARA+ K P ++L DEPTG+LD++TGK VL ++++ 
Sbjct: 122 KWGIMRAEHFPAELSGGEQQRVAIARALVKNPPIII^EPTGSLDFETGKAVLKVMRE 181 

Query: 184 MAQSKETTVIIVTHNTAIAPIRNRVIHMHDSKISDIVINENPSDIQNIEY 233 

+ + + T ++VTHN+A+A IA+RV+++ D K+ + N +P+D I++ 
Sbjct: 182 INRKEGITFVIWHNSAIAAIADRVVYLRDGKVERVERNLHPADPDEIQW 231 

There is also homology to SEQ ID 1354. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
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Example617 

A DNA sequence (GBSx0657) was identified in S.agalactiae <SEQ ID 1915> which encodes the amino 
acid sequence <SEQ ID 1916>. This protein is predicted to be DNA topoisomerase I (topA). Analysis of 
this protein sequence reveals the following: 

5 Possible site: 34 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4716 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9821> which encodes amino acid sequence <SEQ ID 9822> 
was also identified. 

15 The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB13485 GB:Z99112 DNA topoisomerase I [Bacillus subtilis] 
Identities = 442/690 (64%) , Positives = 535/690 (77%) , Gaps = 10/690 (1%) 

Query: 27 LVIVESPAKAKTIEKYLGRNYKWASVGHIRDLKKSSMSIDFE1SINYEPQYINIRGKGPLI 86 
20 LVIVESPAKAKTIE+YLG+ YKV AS+GH+RDL KS-M +D H N+EP+YI IRGKGP++ 

Sbjct: 5 LVIVESPAKAKTIERYLGKKYKVKASMGHVRDLPKSQMGVDIEQNFEPKYITIRGKGPVL 64 

Query: 87 NDLKKEAKKAKKWI^SDPDREGEAISWHLAHItX)LDKEDRNRWFNEITKI3AVK3^FVE 146 
+LK AKKAKKVYLA+DPDREGEAI+WHLAH LDLD RWFNEITKDA+K +F 

25 Sbjct: 65 KELKTAAKKAKKVYLAADPDREGEAIAWHLAHSLDLDMSDCRVVFNEITKIIAIKESFKH 124 

Query: 147 PRQINMDLVDACflARRVLDRIVGYSISPILWKKVKKGLSAGRVQSVALKLIIDRENEIKA 206 

PR INMDLVDAQQARR+LDR+VGY ISPILWKKVKKGLSAGRVQSVAL+LIIDRE EI 
Sbjct: 125 PRMINMDLVDAQQARRILDRLVGYKISPILWKKVKKGLSAGRVQSVALRLIIDREKEIND 184 

30 

Query: 207 FQPEEYWTIDGSFKKGTRKFNATFYGLDGKKFKLSNNEDVKTULKRIKTDEFLVEKVEKK 266 

F+PEEYWTIDG+F KG F A+F+G +GKK L++ DVK +L ++K +++ VEKV KK 
Sbjct: 185 FKPEEYWTIDGTFLKGQETFEASFFGPCNGKKLPLNSEADVKEILSQLKGNQYTVEECVTKK 244 

35 Query: 267 ERRRNAPLPYTTSSLQQDAANKINFRTRKTMMIAQQLYEGLSLGTAGHQGLITYMRTDST 326 

ER+RN LP+TTS+LQQ+AA K+NFR +KTMMIAQQLYEG+ LG G GLITYMRTDST 
Sbjct: 245 ERKRNPALPFTTSTLQQEAARKLNFRAKK^WIAQQLYEGIDLGREGTVGLITYMRTDST 304 

Query: 327 RISPLAQNEATEFITNRFGANYSKHGNK-VKNASGAQDAHEAIRPSSVNHTPESIAKYLD 385 
40 RIS A +EA FI +G + K K AQDAHEAIRP+SV P + L 

Sbjct: 305 RISNTAVDEAAAFIDQTYGKEFLGGKKKPAKKNENAQDAHEAIRPTSVLRKPSELKAVLG 3 64 

Query: 386 KDQLKLYTLIl^FIASQMTAAVFD^MKV^^TQNGVTFIANGSQVKFDGY^VYND 441 

+DQ++LY LIW RF+ASQM AV DTM V+LT NG+TF ANGS+VKF G+M VY + 
45 Sbjct: 365 RDQrKLYKLIWERFVASQMAPAVLDTMSTOLTI^GLTFRANGSKWFSGFMKVYVEGKDD 424 

Query: 442 --TDKNKMLPDMEEGESWKVNTNPEQHFTQPPARFSEASLIKTLEENGVGRPSTYAPTL 499 

+K4+MLPD++EG++V + PEQHFTQPP R++EA L+KTLEE G+GRPSTYAPTL 
Sbjct: 425 QMEEKDRMLPDLQEGDTVLSKDIEPEQHFTQPPPRYTEARLVKTLEERGIGRPSTYAPTL 484 

50 

Query: 500 ETIQKRYYVKLAAKRFEPTELGEIVNSLIVE7FPDIVDVTFTAEMEGKLDEVEIGKEQWQ 559 

+TIQ+R YV L KRF FTELG+IV LI+EFFP+I++V FTA+ME LD VE G +W 
Sbjct: 485 DTIQRRGYVALDNKRFVPTELGQIVLDLIKEFFPEIINVEFTAKMERDLDHVEEGNTEWV 544 

55 Query: 560 KIIDEFYKPFEKELAKAETEMEKIQII03EPAGFDCELCGSPMVIKLGRYGKFYACSNFPE 619 

KIID FY FEK + KAE+EM++++I+ E AG DCELC SPMV K+GRYGKF ACSNFP+ 
Sbjct: 545 KIIDNFYTDFEKRVKKAESEMICEVEIEP3YAGEDCELC3SP^IVYKMGRYGKFLACSNFPD 604 

Query: 620 CHNTKAITKEIGVICPICQKGQVIERKTKRNRIFYGCDRYPECEFTSWDKPIGRTCPKSN 679 
60 C NTK I K+IGV CP C +G H-+ERK+K+ R+FYGCDRYP+CEF SWDKPI R CPK 

Sbjct: 605 CRNTKPIVKQIGVKCPSCGEGNIYERKSKKKRVFYGCDRYPDCEFVSWDKPIERKCPKCG 664 
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Query: 680 DFLVEKKVRGGGKQWCSNEKCDYQEEKIK 709 

LVEKK++ G QV C -t-CDY+EE K 
Sbjct: 655 KMLVEICKLK-KGIQVQC- -VECDYKEEPQK 691 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1917> which encodes the amino acid 
sequence <SEQ ID 1918>. Analysis of this protein sequence reveals the following: 

} N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 5445 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 595/704 (84%), Positives = 656/704 (92%), Gaps = 1/704 (0%) 



IDF+NNYEPQYINIRGKGPLIN LKKEAK AKKVYIASDPDREGEAISWHL+HIL LD + 



D NRWFNEITKDAVK+AFVEPRQI+MDL\TD+QQARRVLDRIVGYSISPILWKKVKKGLS 



AGRVQSVALKLIIDREN+IKAF P+EYW+IDG FKKGT+KF ATFYG+-GKK KL NN D 



++FLV KV+KKERRRNAPLPYTTSSLQQDAANKINFRTRKTMM+AQQLYE 





6 


Sbjct: 


7 


Query: 


66 


Sbjct: 


67 


Query: 


126 


Sbjct: 


127 


Query: 


186 


Sbjct: 


187 


Query: 


246 


Sbjct - 






306 


Sbjct: 


307 


Query: 


366 


Sbjct: 


367 


Query: 


426 


Sbjct : 


427 


Query: 


486 


Sbjct: 


487 


Query: 


546 


Sb j ct : 


547 




606 


Sbjct: 


607 




666 


Sbjct: 


667 



G+ LG G QGL1TYMRTDSTRISP+AQN+A +FI NRFGANYSKHGN+VKN S 



EAIRPSSVNHTP+SIAKYL+KDQLKLYTLIIVNRF+ASQMTAAVFDT+ICVNrj QNGV F+A 



ENGVGRPSTYAPTLE IQ+RYYVKL+AKRFEPTELGEIVN LIVEFFPDIVDV FTAEME 



GKLD+VEIG+EQWQ +ID+FY+PF KEL KAE+E+EKIQIKDEPAGFDC++CG PMVIKL 



GR+GKFYACSNFPEC NTKAITKEIGV CP+C KGQVTERKTX+NRIFYGCD+YP+CEF 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 618 

A DNA sequence (GBSx0658) was identified in S.agalactiae <SEQ ID 1919> which encodes the amino 
5 acid sequence <SEQ ID 1920>. Analysis of this protein sequence reveals the following: 

Possible site: 43 

»> Seems to have no N-terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certainty=0. 2578 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < succ> 

The protein has homology with the following sequences in the GENPEPT database: 

15 >GP:AAD35341 GB:AE001708 DNA processing chain A [Thermotoga maritima] 

Identities = 97/231 (41%) , Positives = 149/231 (63%) , Gaps = 2/231 (0%) 

Query: 51 FIENYKQLDLKKLRQEFKKFPV- -LSILDSNYPLELKEIYNPPVLLFYQGNIELLSKPKL 108 
F+E + +L++ ++ +K V +S + +YP L+EI PP +LF +G+ ELL + + 
20 Sbjct: 41 FLEKCGKEELERQKELIRKHNVKLVSFWEDDYPQHLREIRyPPAvLFvRGDAELLKEKCV 100 



Query: 109 AWGARQASQIGCQSVKKIIKETNNQFVIVSGIARGIDTAAHVSALKNGGSSIAVIGSGL 168 

WG R+ + G K+ +K + FVIVSG+A GID+ AH AL +GG ++AV+G+G+ 
Sbjct: 101 GWGTRRPTSYGVNVTKRFVKLLSEYFVIVSGMAFGIDSVAHKEALSSGGKTVAVLGTGV 160 

Query: 169 DVYYPTENKKLQEYMSYNHLVLSEYFTGEQPLKFHFPERNRirAGLCQGIVVAEA[aVIRSG 228 

DV YP N++L + N V+SEY G + K HFP RNRIIAGL I+V EA ++SG 
Sbjct: 161 DVVYPRSNERLFHEIVKNGCVVSEYPMGTRARKHHFPARNRIIAGLSDAIIVTEAPIKSG 220 

Query: 229 SLITCERALEEGREVFAIPGNIIDGKSDGCHHLIQEGAKCIISGKDILSEY 279 

+LIT + ALE GR+VFA+PG+I S+G ++LI+ GA + +D+ + + 
Sbjct: 221 ALITVKFALESGRDVFAVPGDIDRKTSEGTNYLIKSGAYPLTDEEDLETHF 271 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1921> which encodes the amino acid 
sequence <SEQ ID 1922>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2856 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 185/279 (66%) , Positives - 238/279 (84%) , Gaps = 1/279 (0%) 

Query: 1 MNHFELFKLKKAGLTNLNIHNI INYLKKNSLTSLSVRNMAWSKCKNPTFFIENYKQLDL 60 

+NHFEL+KLKKAGLTN NI N1++Y +K+ SLS+R+MAWS CK+P+ FIE YKQLD+ 
Sbjct: 1 vNHFELYKLKKAGLTNKNILNILDY-QKHQEKSLSLRDMAWSGCKHPSHFIEAYKQLDI 59 

Query: 61 KKLRQEFKKFPVLSILDSNYPLELKEIYNPPVLLFYQGNIELLSKPKLAWGARQASQIG 120 

+ L+ EFK+FP +SILD +YP+ LKEIYNPPVLLF+QGN++LL KPKLA+VG+R++S G 
Sbjct: 60 QNLKMEFKQFPSISILDKHYPMALKEIYNPPVLLFFQGNLDLLEKPKIAIVGSRRSSDTG 119 

Query: 121 CQSVJCKIIKETIMQFVIVSGIjARGIDTAAHVSALiICNGGSSIAVIGSGLDVYYPTENKKLQ 180 
+SV+KI+KE N+FVIVSGLARGIDT+AH++ LKtJGG +IA+1G+GLD +YP EN++LQ 
. Sbjct: 120 VKSVRKILKELGNRFVIVSGLARGIDTSAHLACLKNGGQTIAIIGTGLDRFYPKENRELQ 179 



Query: 181 EYMSYNHLVLSEYFXGEQPLKFHFPERNRIIAGLCQGIWAEAKMRSGSLITCERALEEG 240 
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++ NHLVL+EY GE+ L +HFPERNRIIAGL +GI+V EAK RSGSLITC+ +EEG 
Sbjct: 180 TFLGKNHLVLTEyGPGEEALSYHFPERNRIIAGLSRGILWEAKNRSGSLITCQIGIEEG 239 

Query: 241 REVFAIPGNIIDGKSDGCHHLIQEGAKCIISGKDILSEY 279 
5 R+4-FA+PGNI+DGKS+GC LI+EGA C+ SG DILSEY 

Sbjct: 240 RDIFAVPGNILDGKSEGCLQLIKEGATCVTSGMDHiSEY 278 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

10 Example 619 

A DNA sequence (GBSx0659) was identified in S.agalactiae <SEQ ID 1923> which encodes the amino 
acid sequence <SEQ ID 1924>. This protein is predicted to be lipoprotein (ceuE). Analysis of this protein 
sequence reveals the following: 

Possible site: 24 
15 »> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0 . 3000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

20 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 

. [Staphylococcus aureus] 
es = 201/348 (57%), Gaps = 16/348 (4%) 

Query: 1 MTKKLIIAILALCTILTTSQAVLAKEKSQ TVTIKNNYSVYIKKEKRDKPDNK 52 

M K ++ +LA+ +L KE+S+ TV I+NNY + + EK+D D K 

Sbjct: 1 MKKTVLYLVIJWMFLlAACGNNSDKEQSKSETKGSKDTvK 58 

Query: 53 KQISETLKVPLKPKIWWFDMGALDTITALGAEKSVIGIPKAKNALSLLPNNVKSVYKAK 112 

K + ET++VP P+ W D GALD + +G V +PK + SL PN ++S +K 
Sbjct: 59 K- VKETVEVPKNPE^VVLDYGALDVMKEMGLSDKV'JCALPKGEGGKSL- PNFLES - FKDD 115 

Query: 113 RYQDVGSLFEPNFFAIARMQPDWFLGAR^SVDNIEKLKEARPKAALVYAGVDSKKVFD 172 

+Y +VG+L E NF+ IA +P+V+-F+ R A+ W+++ K+AAPKA +VY G D K + 
Sbjct: IIS KYTWGNLKEVNFDKIAATKPEVIFISGRTANQKNLDEFKKARPKAKIVYVGADEKNLIG 175 

Query: 173 KGVAERVTMLGKIFDQNKKAKTFNKDIAQAVLKLQKTIEKKGKPTALFVMANSGELLTQS 232 

+ 4- +GKI+D+ KAK NKD+ + ++ + K T ++++ N GEL T 
Sbjct: 17S S-MKQNTENIGKIYDKEVKAKEI^KDLDNKIASKKDKTKNFHK-TVMYLLVNEGELSTFG 233 

Query: 233 PSGRFGW-IFSVGGFKAVNENEKLSSHGTPVSYEYIAEKNPNYLFVLDRGATIGQGASSK 291 

P GRFG ++ GF AV++ S+HG VS EY+ ++NP+ + +DRG + +++K 
Sbjct: 234 PKGRFGGLVYDTLGBWAVDKKVSNSIfflGQNVSNEYVKTJiINPDVIL^RGQAVSGKSTAK 293 

Query: 292 ELFNNDVIKATDAVKNKRVHEVDGKDWYINSGGSRVTLRMIKDVQNFV 339 

+ NN V+K A+K +V+ +D K WY +G + T++ V 
Sbjct: 294 QALNNPVLKNVKAIKEDKVYI^DPKLWYFARGSTTTTIKQIEELDKW 341 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1925> which encodes t 
sequence <SEQ ID 1926>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

»> May be a lipoprotein 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certtainty=0 . 0000 (Not Clear) < suco 
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An alignment of the GAS and GBS proteins is shown below: 

Identities = 57/255 (22%) , Positives = 104/255 (40%) , Gaps = 30/255 (11%) 

Query: 66 KKIWrFDMGMiDTITMjGREKSVIGIPKAKNALSLLPNNVKSVYKAKRYQDVCSSLFBPNF 125 

+++V + +D L + ++G+ +K L LP +V + VG P+ 

Sbjct: 45 QRIVATSVAWD1 CDRLNLD - - LVGVCDSK- -LYTLPKRYDAVKR VGLPMNPDI 94 

Query: 126 FAIAEJMQPDWFLGARMASVTINIEKLKFAAPKAAL^^ 185 

E IA ++P + + E L+ K Y ++ + V +G+ + + LG + 

Sbjct: 95 ELIASLKPTWILSPNSLQ EDLEPKYQKLDTEYGFLNLRSV--EGMYQSIDDLGNL 147 

Query: 186 FDQNKKAKTFNKDIAQAVLKLQKTIEKKGKPTALFVl^ANSGELLTQSPSGRFGWIFSVGG 245 

F + ++AK + Q+KKPL+MGL + G++G 

Sbjct: 148 FQRQQEAKELRQQYQDYYRAFQAKRKGKKKPKVLILMGLPGSYIiVATNQSYVGNIjLDLRG 207 

Query: 246 FKAV NENEKLSSHGTPVSYEYIAEKNPMYLFVLDRGATIGQGAS SKELFMNDVI 299 

+ V +E E LS++ E + K P+ +L I KE ND+ 

Sbjct: 208 GENVYQSDEKEFLSANP EDMLAKEPD- -LILRTAHAIPDKVKVMFDKEFAENDIW 260 

Query: 300 KATDAVKNKRVHEVD 314 

K AVK +V+++D 
Sbjct: 261 KHFTAVKEGKVYDLD 275 

SEQ ID 1924 (GBS181) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 39 (lane 5; MW 38.7kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 46 (lane 3; MW 64kDa). 

The GBS181-GST fusion product was purified (Figure 204, lane 9) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 299), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 620 

A DNA sequence (GBSx0660) was identified in S.agalactiae <SEQ ID 1927> which encodes the amino 
acid sequence <SEQ ID 1928>. This protein is predicted to be iron(III) ABC transporter, ATP-binding 
protein. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 



Final Results 

bacterial cytoplasm — Certainty=0. 3231 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB12190 GB:Z99106 similar to ferrichrome ABC transporter 
(ATP-binding protein) [Bacillus subtilis] 
Identities = 125/247 (50%) , Positives = 187/247 (75%) 

Query: 1 MIQINNLHKFYGQKEILKDINISIPKGKVTAILGPNGSGKSTLLSCISRLEPYDNGEIFL 60 

M ++ + N+ K YG K +L++ +++I KGK+T+ +GPNG+GKSTLLS +SRL D+GEI++ 
Sbjct: 1 MVEVTUWSKQYGGKVVLEETSVTIQKGKITSFIGPNGAGKSTLLSIMSRLIKKDSGEIYI 60 

Query: 61 DKVPLAHYSSNDLAKT1AILRQSNHLTLKIKVRDLIGFGRFPYSKGRLSQKDKAVIESVI 120 

D + S +LAK ++IL+Q+N + +++ ++DL+ FGRFPYS+GRL+++D I + 
Sbjct: 61 DGQEIGACDSKELAKKMSILKQANQINIRLTIKDLVSFGRFFYSQGRLTEEDWVHINQAL 120 
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Query: 121 SYMDIOTIADEFimLSGGQIQRAFIJ^T^QDTQYICLDEPLKNLDMICyAVQMMDLIKR 180 

SYM L DI D++++ LSGGQ QRAFIAM +AQDT YI LDEPLNNLDMK++V+4M L+KR 
Sbjct: 121 SYMKLEDIQDKYLDQLSGGQCQRAFIAMVIAQDTDYIFLDEPLKNLDMKHSVEIMKLLKR 180 

Query: 181 YAYEFNKTIVIIIHDINFATHYADNWALKEGQV\?TCGTVEDVMQEKILSHLFDMPIRIE 240 

E KTIVI+IHDINFA+ Y+D +V7ALK G++V G E++++ +L ++DM I 1 + 
Sbjct: 181 LVEELGKTIVIVIHDINFASVYSDYIVALKNGRIVKEGPPEEMIETSVLEEIYDMTIPIQ 240 

Query: 241 TVDGKPI 247 

T+D + I 
Sbjct: 241 TIDNQRI 247 

There is also homology to SEQ ID 1930. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Example 621 

A DNA sequence (GBSx0661) was identified in S.agalactiae <SEQ ID 1931> which encodes the amino 
acid sequence <SEQ ID 1932>. Analysis of this protein sequence reveals the following: 



INTEGRAL Likelihood =-12.74 Transmembrane 271 - 287 ( 266 - 295) 
INTEGRAL Likelihood = 

Transmembrane 185 - 201 ( 178 - 207) 

Transmembrane 112 - 128 ( 105 - 132) 

Transmembrane 231 - 247 ( 227 - 261) 



INTEGRAL Likelihood = -8 

INTEGRAL Likelihood = -7. 

INTEGRAL Likelihood = -7 

INTEGRAL Likelihood = -2 

INTEGRAL Likelihood = -1 



139 - 155 ( 135 - 156) 
97 Transmembrane 302 - 318 ( 301 - 319) 



30 Final Results 

bacterial membrane Certainty=0 . 6095 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

35 The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB12189 GB:Z99106 similar to ferrichrome ABC transporter 
(permease) [Bacillus subtilis] 
Identities = 138/315 (43%) , Positives = 222/315 (69%) , Gaps = 6/315 (1%) 

40 Query: 9 KLLILLILLIAAIILFLIYGIPTDANEFLIIYILKTRYQKLIALILVGICIGSSSLIFQT 68 
K+ +L+ LI I LFL Y + Y L R +K+ A++L G I S++IFQT 

Sbjct: 6 KIALLVGLMVCIGLFLFYDLGNWD YTLPRRIKKVAAIVLTGGAIAFSTMIFQT 59 

Query: 69 LTNWRLLTPSIIGLDSLYILIQTGLMYLIGAQRVIKFSSFSSFLLSLLLMVGFAYLLFTI 128 
45 +TNNR+LTPSI+GLDSLY+LIQTG+++L G+ ++ + +F++S+LLM+ F+ +L+ I 

Sbjct: 60 ITNNRILTPSILGLDSLYMLIQTGIIFLFGSANKVI.MNKNINFIISVLLMILFSLVLYQI 119 

Query: 129 LFRNKKQSLYFVLLAGLIFNTLFSSISSFIQAIMDPNDFMILQNQLFASFNAINTKILWI 188 
+F+ + ++++F+LL G++F TLFSS+SSF+Q ++DPN+F ++Q+++FASFN INT +LW+ 
50 " Sbjct: 120 MFKGEGRNIFFLLLIGIVFGTLFSSLSSFMQMLIDPKEFQWQDKMFASFNNINTDLLWL 179 

Query: 189 SFIIIWSFVINWPFIKELDVLLLGKENAISLGISYQKLTTRFFLWLALMVAIATALVGP 248 

+FII +++ V W F K DVL LG+E+A++LGI Y K+ + + +A++V+++TALVGP 
Sbjct: 180 AFIIFLLTGVYVVffiFTKFFDVLSLGREHAWLGIDYDKOTKQMLIWAILVSVSTALVGP 239 

55 

Query: 249 ITFLGLLVAHITYHSFHTFRHQILVPIAIVICIFTLVLGQHLVQNLLHLTVQLSVLLNLI 308 

I FLGLLV ++ T++H L+ ++ I I LV GQ +V+ + + LSV++N 

Sbjct: 240 IMFLGLLVVNLAREFLKTYKHSYLIAGSVFISIIALVGGQFWEKVFTFSTTLSVIINFA 299 



60 Query: 309 GGSYFIFTLIKGRKN 323 

GG YFI+ L+K K+ 
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-712- 



Sbjct: 300 GGIYFIYLLLKENKS 314 



A related DNA sequence was identified in S.pyogenes <SEQ ID 1933> which e 
sequence <SEQ ID 1934>. Analysis of this protein sequence reveals the following: 



INTEGRAL 
INTEGRAL 
INTEGRAL 



3 W-terminal signal sequence 



Likelihood =-13 
Likelihood = -8 
Likelihood = -8 
Likelihood = -8 
Likelihood = -6 
Likelihood = -4 
Likelihood = -3 
Likelihood = -2 
Likelihood = -1 
Likelihood = -0. 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 

Transmembrane 



259 - 275 

296 - 312 

83 - 99 

212 - 228 

113 - 129 

140 - 156 

165 - 181 

327 - 343 



246 - 286! 

294 - 316! 

78 - 1041 

210 - 231! 

110 - 132! 

134 - 157: 

165 - 18i; 

327 - 343', 



Final Results 

bacterial membrane Certainty=0 . 6456 (Affirmative) < succ; 

20 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



A related sequence was also identified in GAS <SEQ ID 9175> which encodes the amino acid sequence 
<SEQ ID 9176>. Analysis of this protein sequence reveals the following: 



Possible site: 49 












>» Seems to ha 


ve no N-terminal signal sequence 










INTEGRAL 


Likelihood =-13.64 


Transmembrane 


24 


- 40 


( 17 


52) 


INTEGRAL 


Likelihood = -8.97 


Transmembrane 


250 


- 266 


( 237 


277) 


INTEGRAL 


Likelihood = -8.65 


Transmembrane 


287 


- 303 


( 285 


307) 


INTEGRAL 


Likelihood = -8.39 


Transmembrane 


74 


- 90 


( 69 


95) 


INTEGRAL 


Likelihood = -6.26 


Transmembrane 


203 


- 219 


( 201 


222) 


INTEGRAL 


Likelihood = -4.04 


Transmembrane 


104 


- 120 


( 101 


123) 


INTEGRAL 


Likelihood = -3.61 


Transmembrane 


131 


- 147 


( 125 


- 148) 


INTEGRAL 


Likelihood = -2.71 


Transmembrane 




- 172 


( 156 


172) 


INTEGRAL 


Likelihood = -1.06 


Transmembrane 


318 


- 334 


( 318 


334) 


INTEGRAL 


Likelihood = -0.22 


Transmembrane 


41 


- 57 


( 41 


57) 



Final Results 

bacterial membrane Certainty=0 . 646 (Affirmative) < suco 

40 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 80/326 (24k), Positives = 157/326 (47%), Gaps = 34/326 (10%) 

45 





Query: 


10 


LLILLILLIAAIILFLIYGIPTDANEFL IIYILKTRYQKLIALILVGICI 


59 








+L++L LL A+I + G+ + + I R+ +++ +L G I 






Sbjct: 


34 


VLLILSLLFLAVIALSLGGLAVSYGAIVKGLFVAYDPQVALIYDLRFPRIVIALLAGAGI 


93 


50 




60 


GSSSLIFQTLTNNRLLTPSIIGL DSLYILIQTGLMYLIGAQRVIKFSSFSSFL L 


113 








S ++FQ + N + P+IIG+ S +L+ + L+ +++ + SFL + 






Sb j ct : 


94 


AVSGVLFOAVLKNPI SDPAI IGI CSGAS FMVLVS SLLL PQLLLYGPIVSFLGGGV 


148 




Query: 


114 


SLLLMVGFAYLLFTILFRNKKQSLYFVLLAGLIFNTLFSSISSFIQAIMDPNDFMILQNQ 


173 


55 






S LL+ G A+ K + ++L G+ N LF +S+ + + M+ N 






Sbjct: 


149 


SFLLIYGLAW KKGLNPIRLILTGIAINALFMGLSTALTSFFTSASPMV- -NA 


198 






174 


LFAS FNAINTKI - LWIS FI 1 1 WSFVINWPFIKELDVLLLGKENAISLGI S YQKLTTRFF 


232 








LA+T+ + F +++ K ++LLL + LGI L 




60 


Sb j ct : 


199 


LLAGHISQKTWADVGVLFPYTFIGLLLALLLSKTCNLLLLDDQVIRHLGIDATALRLGIS 


258 



233 LTO^ALMVAIATALVGPITFLGLLVAHITYHSFHTFRHQILVPIAIVICIFTLVLGQHLVQ 292 
L L+ ++AT++VG 4+FLGL+V H++ + +HQIL+P + ++ F +L L + 



WO 02/34771 PCT/GB01/04789 
-713- 

Sbjct: 259 LVAVLtASVATSIVGWSFLGLIVPHMSRLLVGS-KHQILIPFSALLC3APVFLLRDTU3R 317 

Query: 293 NLLH - LTVQLSVLLNLIGGSYFI FTL 317 

+L + L + ++++-+++GG YFI+ L 
Sbjct: 318 SLAYPLEISPAIIMSIVGGPYFIYI.L 343 

A related DNA sequence was identified in S. pyogenes <SEQ ID 249 1> which encodes amino acid sequence 
<SEQ ID 2492>. An alignment of the GAS and GBS sequences follows: 

Score = 51.9 bits (122), Expect = 5e-08 

Identities = 73/327 (22%) , Positives = 137/327 (41%) , Gaps = 38/327 (11%) 

rGIAIAFRGLGJ^IAMVPPTTWLALGTAILMVGAAFALAGTQA 553 

L IA +GA + +V A+ L++ A 

iPLVIAIGAIGAPVGI WAAIVGAIAVITLI IQAIMNWGA- - - 629 

{XXXTDSLATLLTI IANAIGSMLPIVAGAISQI VG A 606 

++ T T A + ++G S +V + 

'- - -TAWSNFTAWLSGLWSSWSTGQSLWSS 684 



Query: 


494 : 


Sbj at- 






554 ] 


Sbjct: 


630 ■ 




607 1 


Sbj ct: 


685 } 




S66 5 
5 


Sbjct: 


745 I 


Query: 


718 : 


Sbjct: 


799 ] 




776 < 
C 


Sbjct: 


854 C 




= 33 



Identities = 83/477 (17%) , Positives = 175/477 (36%) , Gaps = 103/477 (21%) 

Query: 420 GSFLDKISTKFGLFGKKAKEGTD QAANGSRKSGGI I SQI FNGLGNI 465 

G + +++T+FGL G+K K ++ +A ++++ LG + 

Sbjct: 313 GDAVGELNTQFGLTGEKLKSASELLIKYAEINETDISSSAISAKQAIEAYGLTAEDLGMV 372 

Query: 466 VKSAGTAISTAAKGIGTGIKTALSGAPPIISSLGTAISTVA QGIGTGLAIA- 516 

+ + A + + T ++ A+ GAP I LG + A G+ + A++ 

Sbjct: 373 LDNVTKAAQDTGQSVDTIVQKAIDGAPQ-IKGLGLSFEEGAALIGKFEKSGVDSSAALSS 431 

Query: 517 FRGLGAAIAMVPPTT- -WLALGTAILMVGAAFALAGTQA 553 

GL ++ + +T AL A + G+ A A 
Sbjct: 432 LSKAAVIYAKDGKTLTDGLNETVSAIQNSTSETEALSIASEIFGSKAAPRMVDAIQRGAF 491 

Query: 554 --DGISQILRTIGDXXXXXXXXXTDSLATLI.TI IANAIGSMLPIVAGAISQIV 604 

D +++ ++ D + L +A G +L V A+ ++ 

Sbjct: 492 SFDDLAEAAKSSSGTVSTTFDETLDPIDKLTQYSKQAKEGMAELGGKLLETVIPALEPLM 551 

Query: 605 GAVAGGLS QLII AVSTGVSLVIGAFTGL LGGISGVINSISAVTQ 648 

G + ++■ Q 1+ V4-T V +++GA L +G I + + A I 

Sbjct: 552 GMDESSVNWFTSI«TDQQTIVILGLVTTAVMMLLGAIAPLVIAIGAIGAPVGIWAAIV 611 

Query: 649 SLTGVITAVFNGI ATVISSVGSTIKDVLTGLGTAFEGFGNGVK 691 

VIT +1 AS + + 1 T+F++G+ 

Sbjct: 612 GAIAVITLIIQAIMJWGAITF^QSTWDSCAATOSELVCTIVTTATTAWSNFTAWLSGLW 671 

Query: 692 SALEGVG-AVIESFGSAVRNV LDGVANILDSMGTAALNAGRGVKEMAKGIKMLVDL 746 

S++ G ++ SF S++ N+ + G ++ S + N G+ + 
Sbjct: 672 SSWSTGQSLWSSFTSSLSNIFSSLITGAQSLWSSFTSTLSKLWSGLVSTGSNL 725 

Query: 74 7 SLGDLVATLaAVASGLGKMASSAGEMTTIfiSAMSKVnWGMTRLATSATIAITGLTVF 803 
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+L +T++ + +G+ +++++ ++ S +S +G ++ AI ti F 

Sbjct: 726 -FNNLSSTISGIFNGI- -LSTASNIWNSIKSTISNMDCSMOJaVSNGVNRIKNIjFNF 779 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or d 



Example 622 

A DNA sequence (GBSx0662) was identified in S.agalactiae <SEQ ID 1935> which encodes the amino 
acid sequence <SEQ ID 1936>. Analysis of this protein sequence reveals the following: 



Possible site: 13 



d N- terminal signal sequence 



- Final Results 

bacterial cytoplasm - 

bacterial membrane - 

bacterial outside - 



- Certainty=0. 2277 (Affirmative) < 

- Certainty=0.0000(Not Clear) < e 

- Certainty=0 . 0000 (Not Clear) < £ 



The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



vaccines or 



Example 623 

A DNA sequence (GBSx0663) was identified in S.agalactiae <SEQ ID 1937> which encodes the amino 
acid sequence <SEQ ID 1938>. This protein is predicted to be membrane protein (ceuB). Analysis of this 
protein sequence reveals the following: 



3 N-terminal signal sequence 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood » 
Likelihood = 



Transmembrane 
Transmembrane 
Transmembrane 



85 



■ Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



- 257 ( 237 - 

- 143 ( 118 - 

- 1S8 ( 150 - 

- 328 ( 309 - 

- 305 ( 287 - 

- 40 { 22 - 

- 85 ( 68 - 

- 216 ( 198 - 

- 123 ( 107 - 

- 274 ( 258 - 



-- Certainty=0 . 5522 (Affirmative) < succ; 
-- Certainty=0. 0000 (Not Clear) < suco 
-- Certainty=0. 0000 (Not Clear) < suco 



Transmembrane 107 



A related GBS nucleic acid sequence <SEQ ID 8621> which encodes amino acid sequence <SEQ ID 8622> 
was also identified. Analysis of this protein sequence reveals the following: 

45 Lipop: Possible site: -1 Crend: 2 

SRCFLG: 0 

McG: Length of BE: 23 

Peak Value of DR: 2.64 

Net Charge of CR: 2 
50 McG: Discrim Score: 8.59 

GvH: Signal Score (-7.5): -4.6 

Possible site: 26 
»> Seems to have an uncleavable N-term signal seq 
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Amino Acid Composition: calculated from 1 



it: 9 value 
Likelihood 
Likelihood 

Likelihood = • 

Likelihood = ■ 

Likelihood = ■ 

Likelihood = • 

Likelihood = • 

Likelihood = ■ 

Likelihood = • 

Likelihood = 
* score: 2.76 
CFP: 0.552 



11.3 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



- 242 ( 222 - 259) 

- 128 ( 103 - 134} 

- 153 ( 135 - 159) 

- 25 ( 7-31) 

- 70 ( 53 - 71) 

- 201 ( 183 - 201) 

- 284 ( 265 - 284) 

- 108 ( 92 - 108) 

- 259 ( 243 - 259) 



1 Reasoning Step: 3 

— Final Results 

bacterial membrane - 

bacterial outside - 

bacterial cytoplasm - 



• Certainty=0. 5522 (Affirmative) ■ 
■ Certainty=0. 0000 (Not Clear) < s 
- Certainty=0. 0000 (Not Clear) < i 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB12188 GB:Z99106 similar to ferrichrome ABC transporter 
(permease) [Bacillus subtilis] 
Identities = 149/304 (49%) , Positives = 234/304 (76%) 







LVILSLTSLPVGVKSIPLEQITHLDQSQVDIFLTSRLPRT1SILISGASLSVCGLLMQQL 


88 






L+IL++TS+F+GV+ + + L + + SRLPR ISI+I+G S+S+CGL+MQQ+ 




Sbjct: 


10 


LIIIAVTSVFIGVEDLSPLDLFDLSKQEASTLFASRLPRL1SIVIAGI.SMSICGLIMQQI 


69 




89 


TQNKFVSPTTSGTMDWAKLGVWTLIFFKNTSIFIQLCIASGFAILGSLLFVTILKMITF 








++NKFVSPTT+GTMDWA+LG++++L+ F + S I++ +A FA+ G+ LF+ IL+ I F 




Sbj ct : 




SRNKFVSPTTAGTMDWARLGILISLLLFTSASPLIKMLVAFVFALAGNFLFMKILERIKF 


129 


Query: 


149 


KDNIFIPLIGLMLGQIVARATVFLGTHFQVLQSVNSWLQGNFSIMTSHRYEILYIjALPCL 


208 






D IFIPL+GLMLG IV++ F+ + ++Q+V+SWLQG+FS++ RYE+LYL++P + 




Sbjct: 


130 


mTIFIPLVGLMLGNIVSSIATFIAYKYDLIQWSSWLQGDFSLWKGRYELLYLSIPLV 


189 




209 


FLVYFFAHQFTIVGLGESFAKNLGVAYEKMIYFGLVLVSIMTSLVIIIVGALPFLGLIVP 


268 






+ Y +A +FT+ G+GESF+ NLG+ Y++++ GL++VS++TSLVI+ VG LPFLGLI+P 




Sbjct: 


190 


I IAYVYADKFTLAGMGES FSVNLGLKYKRVVNIGLI I VSL ITSLVILTVGMLPFLGL IIP 


249 




269 


NLISITKGDHMSSTILETSLLGACIVMICDLFGRLVIFPYEVSIGVTLGVLGSAFFLISI 


328 






N++SI +GD++ S++ T LLGA V+ CD+ GR++IFPYE+SIG+ +G++GS FL + 




Sbjct: 


250 


NIVSIYRGDNLKSSLPHTVLLGAVFVLFCDILGRIIIFPYEISIGLMVGIIGSGIFLFML 


309 




329 


IRNE 332 








+R + 




Sbjct: 


310 


LRRK 313 





There is also homology to SEQ ID 1940. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 624 

A DNA sequence (GBSx0664) was identified in S.agalactiae <SEQ ID 1941> which encodes the amino 
acid sequence <SEQ ID 1942>. Analysis of this protein sequence reveals the following: 

Possible site: 35 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.90 Transmembrane 140 - 156 ( 140 - 156) 
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Final Results 

bacterial membrane Certainty=0 . 1362 (Affirmative) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear] < suco 
bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB06720 GB:AP001517 maltose transacetylase (maltose 
0- acetyl transferase) [Bacillus halodurans] 
Identities = 93/182 (51%) , Positives = 125/182 (68%) , Gaps = 2/182 (1%) 

Query: 2 TEKEKMIAGQYYRPSAPEIjRKDREVJUJK]MOAF^IN--EDNSSKRNVILQKWFGATGKSIH 59 

TEKEKMLAG+ Y+ PEL KDRE A + + FN E +R ++++ FG+ G+S++ 
Sbjct: 3 TEKEKMLAGERYKAWDPELVKDRERARRLTRLFNQTTETHEKQRTELIKELFGSMGESVN 62 

Query: 60 MEQRFVCDYGCNI YVGENFYANFNQTFLDVCE I R I G3NCMFGPNCQLLTPLHPLDP IERN 119 

+E F CDYG NI+VG NF+ANF+ LDVCE+RIG NCM P + T HP+ P+ER 
Sbjct: 63 IEPTFRCDYGYNIHVGNNFFANFDCVILDVCEVRIGANCMLAPGVHIYTATHPIHPLERV 122 

Query: 120 SGLEYGAPIQIGN^W^'7LGGGVTILPGVULGDN 1 AA/GAGSVV^KSFEM^W'IAGNPAK^IKKL 182 

G EYG P+ I NNVW+GG + PGV +G+N V+ +GSWTK NW+AGNPAK+I+ + 
Sbjct: 123 EGPEYGKPVTIRNNVWIGGRAIVNPGVTIGNNAVIASGSVVTKDVPENVvVAGNPAKVIQTI 184 

A related DNA sequence was identified in S. pyogenes <SEQ ID 1943> which encodes the amino acid 
sequence <SEQ ID 1944>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0 .4052 (Affirmative) < suco 
bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside — Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 68/188 (36%) , Positives = 101/188 (53%) , Gaps = 13/188 (6%) 

TEKEKMLAGQYYRPSAPELRKDREVALKNMQAFN NEDNSSKRNVILQKWFGA 53 

TE +KM G++Y + D E+ K M A + +R+ +L + FG 

TEFDKMTRGEWY DANFDSELIQKRMMAQDLCFDLNQLKPSREEERSAVLNQLFGQ 57 



Query: 


2 


Sbjct: 


3 






Sbjct: 


58 




114 


Sb j ct : 




Query: 




Sbjct: 





PAKIIKKL 181 
P ++++K+ 
PCQWRKI 185 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 625 

A DNA sequence (GBSx0665) was identified in S.agalactiae <SEQ ID 1945> which encodes the amino 
acid sequence <SEQ ID 1946>. This protein is predicted to be ribonuclease H (rnhB-2). Analysis of this 
protein sequence reveals the following: 

Possible site: 32 
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Final Results 

bacterial membrane Certainty=0. 1065 (Affirmative) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Hot Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9823> which encodes amino acid sequence <SEQ ID 9824> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 



TIKEIKAILETIVDLKDKRWQEYQTDSRAGVQKAILQRKKNIQSDLDEEARLEQMLVYEK 63 
T+K+IK L+ + D+D + +DRVQ+QK + ++ M YE+ 
TVKDIKDRLQEVKDAQDPFIAQCENDPRKSVQTLTOQWLKKQAKEKftLKEQm/TJMTSYER 64 



LIAG+DEVGRGPLAGPWA+AVILP C+I L DSKK+ +KK +E Y+ 1+ 







Sbjct: 


5 




64 


Sbjct: 


65 


Query: 


124 


Sb j ct : 


125 






Sbjct: 


185 






Sbjct: 


245 



ID+INIYEA+K AM+ A+ LS P++LL+DAM L L Q II 



KGDANSI.SIAAASIVAKVTRDKIMSDYDSTYPGYAFSKNAGYGTKEHLEGLQKYGITPIH 243 
KGDA S+SIAA + +AKVTRD++MS Y TYP Y F KN GYGTKEHLE L YG T +H 

.EALAAYGPTELH 244 



A related DNA sequence was identified in S. pyogenes <SEQ ID 1947> which encodes the amino acid 
sequence <SEQ ID 1948>. Analysis of this protein sequence reveals the following: 

Possible site: 50 



• Final Results 

bacterial membrane Certainty=0 . 1213 (Affirmative) < 

bacterial outside Certainty=0 . 0000 (Not Clear) < e 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < e 



The protein has homology with the following sequences in the databases: 

>GP:CAB13479 GB:Z99112 ribonuclease H [Bacillus subtilis] 
Identities = 130/252 (51%) , Positives = 176/252 (69%) , Gaps = 3/252 (1%) 

Query: 4 SIKAIKESLEAVTSLLDPLFQEIiATDTRSGVQKALKSRQKYIQAELAEEERLEAMLSYEK 63 

++K IK+ L+ V DP + D R VQ ++ K E A +E+ M SYE+ 
Sbjct: 5 TVKDIKDRLQEVKDAQDPFIAQCENDPRKSVQTLTOQVILKKQAKEKALKEQWVNMTSYER 64 

Query: 64 ALYKKGYKAIAGIDEVGRGPLAGPWAACVILPKYCKIKGLNDSKKIPKAKHETIYQAVK 123 

KG++ IAG+DEVGRGPLAGPWA+ VILP+ C+I GL DSKK+ + K E Y+ + 
Sbjct: 65 IARNKGFRLIAGVDEVGRGPLAGPWASAVILPEECEILGLTDSKKLSEKKREEYYELIM 124 

Query: 124 EKAIAIGIG1 IDNQLIDEVNIYEATKIAr^LEAIKQLEGQLTQPDYLLIDAMTLDIAISQQ 1S3 

++AiA+GIGI++ +IDE+NIYEA+K+AM++AI+ L PDYLL+DAMTL + +Q 

Sbjct: 125 KEAl^VGIGIVEATVIDEINIYEASKMAMVKAIQDLS DTPDYLLVDAMTLPLDTAQA 181 
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Query: 184 SILKGDMfSLSIAAASIVAKVTRDQMMANYDRIFEGYDFAKNAC-YGTKEHLQGLKAYGIT 243 

SI+KGDA S4SIAA + +AKVTRD+MM+ Y +P Y F KN GYGTKEHL+ L AYG T 
Sbjct: 182 SIIKGDAKSVSIAAGACIAKVTRDRMvISAYAETYeMYGFEKNKGYGTKEHLEALAAYGPT 241 

Query: 244 PIHRKSFEPVKS 255 

+HRK+F PV+S 
Sbjct: 242 ELHRKTFAPVQS 253 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 168/256 (65%), Positives = 203/256 (78%), Gaps = 3/256 (1%) 

Query: 1 MMATIKEIKAILETIVDLKDKRWQEYQTDSRAGVQKAILQRKKNIQSDLDEEARLEQMLV 60 

M +IK IK LE + L D +QE TD+R+GVQKA+ R+K IQ++L EE RLE ML 
Sbjct: 1 MPTSIKAIKESLEAVTSLLDPLFQELATDTRSGVQKALKSRQKVIQAELAEEERLEAMLS 60 

Query: 61 YEKKLYIEHINLIAGIDEVGRGPLAGPWAAAVILPPNCKIKHLMDSKKIPKKKHQEIYQ 120 

YEK LY + IAGIDEVGRGPLAGPWAA VILP CKIK LNDSKKIPK KH+ IYQ 
Sbjct: 61 YEKALYKKGYKAIAGIDEVGRGPLAGPWAACVILPKYCKIKGLNDSKKIPKAKHETIYQ 120 

Query: 121 NILDQALAVGIGIQDSQCIDDINIYEATKHAMIDAVSHLS VAPEHLLIDAMVLDLSI 177 

+ ++ALA4-GIGI D+Q ID++NIYEATK AM++A+ L P++LLIDAM LD++I 

Sbjct: 121 AVKEKALAIGIGIIDNQLIDEVNIYEATKLAMLEAIKQLEGQLTQPDYLLIDAMTLDIAI 130 

Query: 178 PQTKIIKGDANSLSIAAASIVAKVTRDKIMSDYDSTYPGYAFSKNAGYGTKEHLEGLQKY 237 

Q I+KGDANSLSIAAASIVAKVTRD++M++YD +PGY F+KNAGYGTKEHL+GL+ Y 
Sbjct: 181 SQQSILKGDANSLSIAAASIVAKVTRDQiVIMANYDRIFPGYDFAKNAGYGTKEHLQGLKAY 240 

Query: 238 GITPIHRKSFEPIKSM 253 

GITPIHRKSFEP+KSM 
Sbjct: 241 GITPIHRKSFEPVKSM 256 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



A DNA sequence (GBSx0666) was identified in S.agalactiae <SEQ ID 1949> which encodes the amino 
acid sequence <SEQ ID 1950>. Analysis of this protein sequence reveals the following: 
Possible site: 16 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1865 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < succ> 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 627 

A DNA sequence (GBSx0667) was identified in S.agalactiae <SEQ ID 1951> which encodes the amino 
acid sequence <SEQ ID 1952>. Analysis of this protein sequence reveals the following: 

Possible site: 14 

»> Seems to bave no N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0. 3 034 (Affirmative) < succ> 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 



Sbjct 
Sbjct: 

SbjCt: 

Query: 
Sbjct. 



tiqwfpghmsktu«qvqenikhvdfvt:lvdarlplssqnpmltkivgdkpklmilnkad 62 
tiqwfpghm+karr+v e +k 4-d v l+dar+plss+npm+ +iv kp+l++i.nk d 

TIQWFPGHMAKARREVTEKLKLIDWIEjLDARVPLSSRNP^DEIVAHKPRLVLLNKDD 61 



123 RTMIIGIPNAGKSTLMNRLAGKKIAWGNKPGVTKGQQMLKSNKELEILDTPGILWPKFE 182 

R MI+GIPN GKSTL+NRLA K+IA VG++PG+TK QQW+K KELE+LDTPGILWPKF+ 
122 RAMILGIPNVGKSTLINRLASKRIAKVGaRPGITKQQQWIKVGKELELLDTPGILWPKFD 181 

183 DELVGLK1ALTGAIKDQLLPMDEVTIFGLNYFKTYVPDRLKERFKSINLEDEAPEIIMAL 242 

D+ G +LA TGAIKD+LL +V +F L Y + YPDRL +R+K L ++ + A+ 
182 DQATGFRI^TGAIKDELLDFQDVALFVLRYmEmPDRLMDRYKLNELPEDGVTLFDAI 241 

243 TQKLGY RDDYDRFYNLFVKEVRDGKLGRYTrjDIVGE 2 78 

+K G+ DYD+ + ++E+R G LGR TL++ G+ 

242 GKKRGHLLSGGYIDYDKTAEMILRELRAGTLGRITLEVPGK 282 



A related DNA sequence was identified in S. pyogenes <SEQ ID 1953> which encodes the amino acid 
sequence <SEQ ID 1954>. Analysis of this protein sequence reveals the following: 

d N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 26B8 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 247/282 (87%) , Positives = 265/282 (93%) 

mTIQWFPGHMSKARRQVQENIIOIVDFVTII.VDARLPLSSQNPMLTKIVGDKPKLMILNK 6 0 
MA IQWFPGHMSKAPJIQVQEN+KHVDFVTILVDARLPLSSQNPMLTKIVGDKPKLMIIiNK 
^IQWFPGHMSiCARRQVQENViarVDFVTILVDARLPLSSQNPMLTKIVGDKPI'CLMrLNK 60 

ADBAD RTKBW+ 4YESCG+KTIAINSKEQSTVKKVT+ AK LM+DKI LR RGIQKE 



TLRTMIIGIPNAGKSTLMNRLAGKKIAWGI.-'-rj'.^r.J, .ji'JLKSNKELEILDTPGILWPK 





1 


Sbjct: 


1 




61 


Sbjct: 


61 




121 


Sbjct: 


121 




181 


Sbjct: 


181 




241 


Sbjct: 


241 



LT++LG4 +DDYDRFY L7VKEVRDC-KLG+YTLD VG+ E 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 628 

A DNA sequence (GBSx0668) was identified in S.agalactiae <SEQ ID 1955> which encodes the amino 
acid sequence <SEQ ID 1956>. Analysis of this protein sequence reveals the following: 

Possible site: 24 

»> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside --- Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9825> which encodes amino acid sequence <SEQ ID 9826> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 



Query: 29 DKAKEKASV IKQASQTSQTSKKEVLQKKT YPNI^KYSNLEIHVSSTRQTMT 79 

D A+E AS+ ++ + +T+K + K YP++ K ++ I V+ Q 

Sbjct: 22 DHAEEHASIIWKKIVENITDWKTAKTSIDOTKPSGGEYPDI-KQKHVWIDVNVKEQKAY 80 

Query: 80 ITSNDKVIFKTIVSTG AKESPTPKGTFVIEPERGDFFYNASSKEGAYYWSFKEHGI 135 

I 1+ ++S+G K+ TPKGTF +EPERG++F++ +EGA YWVS+K HG 

Sbjct: 81 IKEGSNTIYTMM1SSGLDQTKDDATPKGTFYVEPERGEWFFSEGYQEGAEYWVSWKNHGE 140 

Query: 137 YLFHSVPTDQQGNEIPEEAKQI^KAASHGCTOMSRADAIOTFYENIPCGTTVTI 189 

+LFHSVP + I EA++LG SHGC+R++ DAKW YENIP+ T V 1 
Sbjct: 141 FLFHSVPMT1CDQIWIKTEAEICLGTKVSHGCIRLTIPDAKWVYENIPEHTICWI 193 

No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 1956 (GBS644) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 130 (lane 2 & 3; MW 49.6kDa) and in Figure 186 (lane 3; MW 50kDa). It was 
also expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 
130 (lane 5-7; MW 24.6kDa) and in Figure 177 (lane 3; MW 25kDa). 

GBS644-GST was purified as shown in Figure 236, lane 9. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 629 

A DNA sequence (GBSx0669) was identified in S.agalactiae <SEQ ID 1957> which encodes the amino 
acid sequence <SEQ ID 1958>. This protein is predicted to be carbon starvation protein A. Analysis of this 
protein sequence reveals the following: 

Possible site: 19 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-11.25 Transmembrane 129 - 145 ( 122 - 157) 

INTEGRAL Likelihood = -9.92 Transmembrane 316 - 332 ( 305 - 342) 

INTEGRAL Likelihood = -6.42 Transmembrane 164 - 180 ( 157 - 181) 

INTEGRAL Likelihood = -5.73 Transmembrane 443 - 459 ( 441 - 466) 
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INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood = 
Likelihood » 
Likelihood = 
Likelihood « 
Likelihood = 
Likelihood = 
Likelihood = 



Transmembrane 362 ■ 



23 



■ Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



414 - 435! 
183 - 209! 



359 - 379! 
227 - 245) 



• Certainty=0. 5501 (Affirmative) < 

- Certainty=0. 0000 (Not Clear) < s 

- Certainty=0. 0000 (Not Clear) < i 



The protein has homology with the following sequences in the GENPEPT database: 
i protein A, putative 
= 311/470 (65%) , Gaps = 16/470 (3%) 



>GP:AAF938£ 



2 GB:AE004154 carbon s 
[Vibrio cholerae] 
Identities = 220/470 (46%) , Positives = 



Query: 


1 


Sbjct: 


1 




61 


Sbjct: 


61 


Query: 


121 




121 




17 B 


Sbjct: 


181 


Query: 


231 


Sbjct: 


237 


Query: 


291 


Sbjct: 


297 


Query: 


350 


Sbjct: 


357 


Query: 


410 


Sbjct: 


417 



MVTFLGGVALLIVGYFTYGRYIEKNFQIDENRQTPAEALRDGYDFVPMPKWKNGMIELLN 6 0 
M+ FL VA L+ GYF YG ++EK F I+E RQTPA DG D+VPM K +++LLN 
MLWFLTOTAALVGGYFIYGAITOKVFGINEKRQTPAHTKTDGVDYVPMSTPKVYLVQLLN 6 0 



3 GPIFGPI+GALYGP A +WIV+GCIFAGAVHDY GM+S+RN GA +P 4 



++PALF TI+CGAISGFHATQ+P+++R 



I EG+IA+IW ++S 



LLG G IA +GV++LP++SG +AFRS R I+A 



- +PL+VI VLT VDF 4+WRYF +ANQ TAV+ L AT YL+ 



No corresponding DNA sequence was identified in S. pyogenes. 

A related GBS gene <SEQ ID 8623> and protein <SEQ ID 8624> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible s 



6.07 
-3.54 



1: Signal Score (-7.! 

Possible site: 19 
> Seems to have an uncleavable 



ALOM program 



count: 11 value: 
Likelihood = 
Likelihood . 
Likelihood = 
Likelihood > 
Likelihood = 



I- term signal seq 

0.0 

129 - 145 

316 - 332 

Transmembrane 164 - 180 

Transmembrane 416 - 432 

Transmembrane 190 - 206 



- 157) 

- 342) 

- 181) 

- 435) 
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INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood = -4 
Likelihood = -4 
Likelihood = -3 
Likelihood = -2 
Likelihood = -2 
Likelihood = -1 
Likelihood = 0 



modified ALOM score: 



' Reasoning Step: 3 



2.75 



Transmembrane 3 93 



94 ( 70 - 95) 

• 461 ( 441 - 453) 
378 ( 359 - 379) 
244 ( 227 - 245) 

• 18 ( 1-18) 
409 ( 393 - 410) 



- Final Results 

bacterial membrane Certainty=0. 5501 (Affirmative) < succ: 

bacterial outside Certainty=0. 0000 {Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

ORF01729(301 - 1668 of 2082) 

GP|9655126|gb|AAF93852.l| |AE004154(1 - 464 of 494) carbon starvation protein A, putative 
{Vibrio cholerae} 
%Match =29.9 

%Identity =47.6 %Similarity =68.6 

Matches = 218 Mismatches = 138 Conservative Sub.s = 96 



174 



204 



234 



264 



294 



J*AKKFLGGSDMVTFLGGVALLIVGYFTYGRYIEKNFQI 
1= II II III II >>l| I I 
MLWFLTCVAALVGGYFIYGAFVEKVFGI 



DENRQTPAFJ^RDGYDFVPMPKWKNGMIELMIAGTGPIFGPILGALYGPVAYIWIVLGCIFAGAVHDYMIGMISLRNNG 
= 1 Mill II hill I :>>lll|ll llllllhllllll I :|||:|||llllllll Ihhll I 
NEKR0TPAHTKTDGVDWPMSTPKVYLVQLL1>TIAGVGPIFGPIKGALYGPAAMLWIVVGCIFAGAVHDYFSGMLSIRNGG 



AYLPELASRYLGKSMKHVINIFSMLLLILVATVFVVTPANL1LSILPAGT LSLPWIIGLIFVYYLISTVLPIDKALG 

I = 1 = Mil II :|li = : = ll = ll III II =1 = = = I =1= «« =11 ll-H-hll =1 
ASVPSITGRYLGNGAKHFMNIFAIVLLLLVGWFVSAPAGMITNLINQQTDFTVSMTTMWIIFAYYILATIVPVDKIIG 



894 924 954 984 1014 1044 1074 

KVYP VFCTI-LMVSTAAVGFRLLTGGFDMPNLTFETFKM'IHPAGLGIFPALFFTISCGAISGFHATQAPMVSRT 

: II =1 : II = I = III:: - ll = = l = -Mil I I : I I I I I I I I I I I = I - I 

RFYPLFGALLIFMSVGLMTAIAFSSEHQVLGGFEISDM VKNLNPNDMPLWPALFITIACGAISGFHATQSPLMARC 



1104 1134 1164 1191 1221 1251 1281 1311 

50 TVNEREGRFTFYGMMIAEGVIAMIWAGASMSLFKG-QmYEMIAAGTPS 

in in in ii ininn ==i = i = 1 i = 1 i n = in in nnnnni 

INffiNEKNGRFVFYGAMIGEGIIALIWCTVALSFFGSLFjALSEAW 

280 290 300 310 32D 330 340 

55 1341 1371 1401 1431 1461 1491 1521 1551 

LSAFRSLRTIVADYIHVKQDTLPKIFAVTIPLYVISFVLTHVDFl^LTO 

nni i nm n = = =inn nnin -mi =111 nn i n in = m = 

DTAFRSSRLILAEYFNMEQKTLPJ^LLMAVPLFVIGA\Tl/rQVDFGIIVI^^ 

360 370 380 390 400 410 420 

60 

1581 1608 1638 1668 1698 1728 1758 1788 

FVPAMFMLYAWVYIL-SQPIGFNMGLGILTYSLALVLTGIXt/GLF^'KSGQK^ 

limi : :|l I 'II : : I I ' I I 
WPALFMTTVCISFILNSSTLGFGLPMQISTIAGVTlASLGALAYVAKVSKGKGETDIiADEEKPQGVTKTA 
65 440 450 460 470 480 490 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 630 

A DNA sequence (GBSx0670) was identified in S.agalactiae <SEQ ID 1959> which encodes the amino 
acid sequence <SEQ ID 1960>. This protein is predicted to be lytR (lytT). Analysis of this protein sequence 
reveals the following: 
Possible site: 30 

>» Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -0.80 Transmembrane 27 - 43 ( 27 - 43) 



Final Results 

bacterial membrane Certainty=0 . 1319 (Affirmative) < succ 

bacterial outside Certainty=0 . 0000 (Not Clear) < succ> 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 





1 


Sbjct: 


1 


Query: 


61 


Sbjct: 


51 


Query: 


121 


Sbjct: 


121 






Sbjct: 


181 




238 


Sbjct: 


241 



>GP:AAB481B3 GB:L42945 lytR [Staphylococcus aureus] 
Identities = 93/245 (37%), Positives = 150/245 (60%), Gaps = 3/245 (1%) 



G++L I KM +PP +IPATA+DQYA+QAFE +A DY+LKP+ R++QA+++V+ 



F Q P+ ++D+I+++ +1+ I 



SLQQWQDKLPSSQFVRVHRSYIVNINAIKTIEPWFNQTLQLHLCNKITVPVSRANVKPLK 237 

L +++ +L + F+R+HRSYI+N IK ++ WFN T + LN + + VR++K K 
PLNRYEKRLNPTYFIRIHRSYIINTKHIKEVQQWFNYTYMVILTNGVKMQVGRSFMKDFK 240 



There is also homology to SEQ ID 460. 

SEQ ID 1960 (GBS399) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 75 (lane 7; MW 30.4kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 84 (lane 2; MW 55kDa). Purified 
GBS399-GST is shown in Figure 217, lane 9; purified GBS399d-GST is shown in Figure 236, lane 3. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 631 

A DNA sequence (GBSx0671) was identified in S.agalactiae <SEQ ID 1961> which encodes the amino 
acid sequence <SEQ ID 1962>. Analysis of this protein sequence reveals the following: 

Possible site: 51 

>» Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -7.59 Transmembrane 95 - 111 ( 86 - 116) 
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INTEGRAL Likelihood = -5.95 Transmembrane 155 - 171 ( 152 - 176) 
INTEGRAL Likelihood = -2.28 Transmembrane 189 - 205 ( 187 - 206) 
INTEGRAL Likelihood = -1.49 Transmembrane 122 - 138 ( 121 - 138) 

Final Results 

bacterial membrane Certainty=0 .4036 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

3 = 2/570 (0%) 

MTLFLIMMERAGLI ILLAYAFVHIPFIKQTLKQPELKKHQYILLILFSLFAIISNFTGVE 6 0 
++L ++++ER GLII4LAY ++IP+ K + + K ++ L I+FSLFA++SN TG4 
LSLTMLLLERVGLIIILAYVLMCIPYFKNLIvMRRTWKARMQLCIIFSLFALMSNLTGIV 61 



f D S+ANTRVLTIGV+GL+GGP VG+ VG++S 



+Y+ISS+ IG+ +G G ++ + + ++G ME++QM+ IL FS D A+ 















sill' 
























Sb j Ct : 


242 




301 


Sbjct: 


302 




361 


Sbjct: 


362 






Sbjct: 


422 




481 


Sbjct: 


481 




540 


Sbjct: 


541 



S Q +A II M VS+V++TS++ IL++VG G+DHH+P +ILT L+K 



Q+ LG+AE ++LL+DAE+KSLQAQV+PHF FN++N I L+R++SEKAR+L+ + S 



+EN+ KHAF + + N + V++ + + IIVQDNG GI K+K+ LG+ - 



i ++ A+L+FES SGT 
iKGLFGKSAALQFESTSSGT 570 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1963> which encodes the amino acid 
sequence <SEQ ID 1964>. Analysis of this protein sequence reveals the following: 

Possible site: 3 9 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -6.79 Transmembrane 283 - 299 ( 276 - 307) 
INTEGRAL Likelihood = -5.57 Transmembrane 27 - 43 ( 24 - 48) 

Final Results 

bacterial membrane Certainty=0 .3718 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the databases: 

=.GP:CAB54576 GB:AJ0063 96 histidine kinase [Streptococcus pneumoniae] 
Identities = 115/231 (49%) , Positives = 159/231 (68%) , Gaps = 7/231 (3%) 

Query: 351 MI^SIKAYIDEVYVLEVEQRDAQMtlALQSQINPHFLOTrLEyiRMYALSCQQEELADVIY 410 

ML ++ I ++Y LE+ Q+DA MFALQ-S-QINPHF+YNTLE++RMYA+ Q+ELAD+IY 
Sbjct: 1 MLDRLEKNIHDIYQLELSQKDANMRALQAQ1NPHFMYNTLEFLRMYAVMQSQDELADIIY 60 



Query: 527 EHQTT---GNSSIGLQNWLRLFHHFRDRVSWSMAKEPNGGFIIQIRIRKD 574 

EHQ + SIG+ fJV+ R +F DR + ++ G +1 1+ + 

Sbjct: 181 EHQASYSDQRQSIGIVNVHERFVLYFGDRYAITIESAEQAGVQYRITIQDE 231 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 59/180 (32%) , Positives = 97/180 (53%) , Gaps = 8/180 (4%) 

Query: 375 QDAEMKSLQAQVNPHFLFNALNTI - - YGLIRMDSEKARKLVQDFSKVIRANLQFAKQNLI 432 

+DA+M++LQ+Q+NPHFL+N L I Y L E A ++ F+ ++R N+ + K + 

Sbjct: 370 RDAQMRALQSQINPHFLYNTLEYIRMYALSCQQEEIA-DVIYAFASLLRNNISQDK--MT 426 

Query: 433 PLHDELEQVMAYIALEEARFPHMVAFI^NQTNSDDNIjMIPPFTLQVLIENSYKHAFKHV 492 

L +EL Y+ L + R+P+ A+++ + D L IP F +Q L4-EN + H + 

Sbjct: 427 TLKEELAFCEKYIYLYQMRYPDSFAYHVKIDESVAD-LAIPKFVIQPLVENYFVHGIDYS 485 

Query: 493 NKNNQLKVTIARNNDRLHI IVQDNGIGIPKEKLITLGKKTQISKQ- -GSGTAIENLVRRL 550 

+N L + D L I V DNG GI +E+L + K+ Q + S ++N+ RL 

Sbjct: 486 RHDNALSIKALDETDHDLIQVLDNGRGISQERIADMEKRLQEHQTTGNSSIGLQNVYLRL 545 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 632 

A DNA sequence (GBSx0672) was identified in S.agalactiae <SEQ ID 1965> which encodes the amino 
acid sequence <SEQ ID 1966>. Analysis of this protein sequence reveals the following: 

Possible site: 24 

>» May be a lipoprotein 

Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9827> which encodes amino acid sequence <SEQ ID 9828> 
was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 633 

A DNA sequence (GBSx0673) was identified in S.agalactiae <SEQ ID 1967> which encodes the amino 
acid sequence <SEQ ID 1968>. Analysis of this protein sequence reveals the following: 

Possible site: 57 





have no N- terminal signal sequence 










INTEGRAL 


Likelihood = -9.55 Transmembrane 


52 


68 


45 


74) 


INTEGRAL 


Likelihood = -9.18 Transmembrane 
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• Final Results — 

bacterial n 
bacterial outside - 
bacterial cytoplasm - 



Certainty=0. 4821 (Affirmative) ■ 
Certainty=0. 0000 (Not Clear) < : 
Certainty=0. 0000 (Not Clear) < I 



The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S. pyogenes. 

20 A related GBS gene <SEQ ID 8625> and protein <SEQ ID 8626> were also identified. Analysis of tl 

protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 8 
McG: Discrim Score: -8.54 
GvH: Signal Score (-7.5): -5.6 
25 Possible site: 57 

>» Seems to have no N-terminal signal sequence 
AL0M program count: 6 value: -9.55 threshold: 0.0 
Likelihood = -9.55 Transmembrane 
Likelihood = -9.18 Transmembrane 
Likelihood = -8.76 Transmembrane 
Likelihood = -7.48 Transmembrane 
Likelihood = -3.66 Transmembrane 
Likelihood = -1.28 Transmembrane 
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* Reasoning Step: 3 

Final Results — 

bacterial i 
bacterial outside • 
bacterial cytoplasm - 



— Certainty=0. 4821 (Affirmative) < s 

— Certainty=0. 0000 (Not Clear) < sue 

— Certainty=0 . 0000 (Not Clear) < sue 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens f< 
45 vaccines or diagnostics. 



Example 634 

A DNA sequence (GBSx0674) was identified in S.agalactiae <SEQ ID 1969> which encodes the amino 
acid sequence <SEQ ID 1970>. Analysis of this protein sequence reveals the following: 



Possible site: 51 

»> Seems to have no N-terminal signal sequence 

Likelihood = -0.53 Transmembrane 



83 



99 ( 83 - 



- Final Results 

bacterial membrane Certainty= 0.12 13 (Affirmative) 

bacterial outside Certainty=0. 0000 (Not Clear) > 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) . 
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The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
5 vaccines or diagnostics. 

Example 635 

A DNA sequence (GBSx0675) was identified in S.agalactiae <SEQ ID 1971> which encodes the amino 
acid sequence <SEQ ID 1972>. Analysis of this protein sequence reveals the following: 

Possible site: 23 
10 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1902 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

15 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
20 vaccines or diagnostics. 

Example 636 

A DNA sequence (GBSx0676) was identified in S.agalactiae <SEQ ID 1973> which encodes the amino 
acid sequence <SEQ ID 1974>. Analysis of this protein sequence reveals the following: 

Possible site: 20 
25 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0 .4763 (Affirmative) <; suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

30 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
35 vaccines or diagnostics. 

Example 637 

A DNA sequence (GBSx0677) was identified in S.agalactiae <SEQ ID 1975> which encodes the amino 
acid sequence <SEQ ID 1976>. Analysis of this protein sequence reveals the following: 

Possible site: 20 
40 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — - Certainty=0. 5089 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

45 bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 
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The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

5 Example 638 

A DNA sequence (GBSx0678) was identified in S.agalactiae <SEQ ID 1977> which encodes the amino 
. acid sequence <SEQ ID 1978>. Analysis of this protein sequence reveals the following: 

Possible site: 25 

»> May be a lipoprotein 

10 

Pinal Results 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

15 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes: 

SEQ ID 1978 (GBS184) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 26 (lane 7; MW 21kDa), in Figure 168 (lane 14-16; MW 36kDa - thioredoxin 
20 fusion) and in Figure 238 (lane 9; MW 36kDa). It was also expressed in E.coli as a GST-fusion product. 
SDS-PAGE analysis of total cell extract is shown in Figure 37 (lane 7; MW 46.4kDa). 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 639 

25 A DNA sequence (GBSx0679) was identified in S.agalactiae <SEQ ID 1979> which encodes the amino 
acid sequence <SEQ ID 1980>. Analysis of this protein sequence reveals the following: 

Possible site: 52 

»> Seems to have no H-terminal signal sequence 

30 Final Results 

bacterial cytoplasm --- Certainty=0 .2179 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Hot Clear) < suco 

35 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 640 

40 A DNA sequence (GBSx0680) was identified in S.agalactiae <SEQ ID 1981> which encodes the amino 
acid sequence <SEQ ID 1982>. This protein is predicted to be immunogenic secreted protein precursor. 
Analysis of this protein sequence reveals the following: 
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Possible site: 34 

>» Seems to have no N-terminal signal sequence 

Final Results 

5 bacterial cytoplasm Certainty=0. 2166 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9351> which encodes amino acid sequence <SEQ ID 9352> 
10 was also identified. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1983> which encodes the amino acid 
sequence <SEQ ID 1984>. Analysis of this protein sequence reveals the following: 

Possible site: 19 
>» Seems to have an uncleavable N-term signal seq 
15 INTEGRAL Likelihood = -3.77 Transmembrane 9 - 25 ( 5 - 27) 

Final Results 

bacterial membrane --- Certainty=0 . 2508 (Affirmative) < suco 
bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 
20 bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 64/86 (74%) , Positives = 76/86 (87%) 

25 Query: 1 MGNGGDWKNKPGYQTTHEAKTGYAISFSPGQAQADRTYGHVAIVEDVKEDGSIPISESNV 60 

MGNGGDW+ KPG+ TTH+ K GY +SF+PGQAGAD TYGHVA+VE +KEDGSI ISESNV 
Sbjct: 452 MGNGGOTQRKPGFVTTHKPKVGYWSFAPGQAGADATYGHVAVVEQIKEDGSILISESNV 511 

Query: 61 LGLGTISYRTFSAAEAAQLTYWG3K 86 
30 +GLGTISYRTF+A +A+ LTYWG+K 

Sbjct: 512 MGLGTISYRTFTAEQASLLTYWGDK 537 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

35 Example 641 

A DNA sequence (GBSx0681) was identified in S.agalactiae <SEQ ID 1985> which encodes the amino 

acid sequence <SEQ ID 1986>. This protein is predicted to be immunogenic secreted protein precursor. 

Analysis of this protein sequence reveals the following: 

Possible site: 40 
40 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 2495 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

45 bacterial outside — Certainty=0 .0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

AAB52379 GB:U31811 immunogenic secreted protein precursor [Streptococcus pyogenes] 
Identities = 133/259 (51%) , Positives = 170/259 (65%) , Gaps = 4/259 (1%) 

50 

Query: 3 PSQPQVTATPQKSEVVTPAITSGIDLPDVAIPTAMASAAYVKHWIGNDAYTHNLLSHRYG 62 

P QP + A + V P S DL + P++ +SAAYV+HW G+ AYTHNLLS RYG 
Sbjct: 174 PIQPPLGAA APVFAPWRESDKDLSI^K-PSSRSSAAYVRHWTGDSAYTHNLLSRRYG 229 



55 Query: 63 ITAAQLDGFLQSTGITYDSSRIDGQKIIJJREKSSGLnARAIIAIAIAESSLGTQGVATAP 122 

ITA QLDGFL S GI YD R++G+++L+ EK +GLD RAI+AIA+AESSLGTQGVA 
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Sbjct: 230 ITAEQLDGFMSLGIHYDKERIiNGItRLLEWEKLTGLDVRAIVAIAMAESSLGTQGVAKEK 289 

Query: 123 GMmFGFGAVDNNTTNAQNFSDDKAVIKMTQETIIQNQNTSFAIQDQKAQFLSTGNLNVA 182 

G+NMFG+GA D N NA+ +SD+ A+ M ++TII N+K 4F QD KA+ S G L+ 
Sbjct: 290 GSNMFGYGAFDFNPNNAKKYSDF^IRHMVEDTIIANKNOTFERQDLKAKKWSLGQtiDTL 349 

Query: 183 ARGGVYFTDASGSGKRPAAIMESIDKWIDAHGGISSISKELMTSSVAMMAVPTSySVSR 242 

GGVYFTD SGSG+RRA IM +D+WID HG +1 + L TS VP Y S+ 

Sbjct: 350 IDGGVYFTDTSGSGQRRADIMTKLDQKIDDHG.MTPDIPEHLKITSGTQFSEVPVGYKRSQ 409 

Query: 243 ANQAGNYVAGTYPWGQRTW 261 

Y + TY +GQ TW 
Sbjct: 410 PQNVLTYKSETYS FGQCTW 428 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1987> which encodes the amino acid 
sequence <SEQ ID 1988>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

>>> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < succ> 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 143/265 (53%) , Positives = 184/265 (68%) , Gaps = 5/265 (1%) 

Query: 2 VPSQPQVTATPQKSEWTPA ITSGIDLPDVAIPTAMASAAYVKHVIIGNDAYTHNL 56 

V+P+ + QETP S +DI. ++ IP+ AAYV+HW G +AYTH+L 

Sbjct: 135 VDTAPASSLSKQLPEARTPIQSLSPYVSDLDLSEIDIPSVNTYAAYVEHWSGJ3IAYTHHL 194 

Query: 57 LSHRYGITAAQLDGFLQSTGITYDSSRIDGQKILDREKSEGLDARAIIAIAIAESSLGTQ 116 

LS RYGI A Q+D +L+STGI YDS+RI+G+K+L EK SGLD RAI +AIA++ESSLGTQ 
Sbjct: 195 LSRRYGIKADQIDSYLKSTGIAYDSTRINGEKLLQWEKKSGLDVRAIVAIAMSESSLGTQ 254 

Query: 117 GVATAPGANMFGFGAVDNNTTNAQNFSDDKAVIKTCTQETIIQNQNTSFAIQDQKAQFLST 176 

G+AT GANMFG+ A D + T A F+DD A++KMTQ+TII+N+N++FA+QD KA S 
Sbjct: 255 GIATLLGANMFGYAAFDLDPTQASKFNDDSAIVKMTQDTIIKNKNSNFALQDLKAAKFSR 314 

Query: 177 GNr J NVAARGGVYFTDASGSGKRRAAIMESIDKWIDAHGGISEIS?CELIOTSSVA^l^IAVP^ 236 

GLHA+ GGVYFTD +GSGKRRA IME +DKWID HGG 1 EL SS + +VP 
Sbjct: 315 GQLNFASDGGVYFTDTTGSGKRRAQIMEDLDKKIDDHGGTPAIPAELKVQSSASFASVPA 374 

Query: 237 SYSVSRANQAGNYVAGTYPWGQRTW 261 

Y +S++ Y A +Y WGQ TW 

Sbjct: 375 GYKLSKSYDVLGYQASSYAWGQCTW 399 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 642 

A DNA sequence (GBSx0682) was identified in S.agalactiae <SEQ ID 1989> which encodes the amino 
acid sequence <SEQ ID 1990>. Analysis of this protein sequence reveals the following: 

Possible site: 27 

»> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 
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The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8627> and protein <SEQ ID 8628> were also identified. Analysis of this 
protein sequence reveals the following: 

5 Lipop: Possible site: -1 Crend: 4 

McG: Discrim Score: 11.56 
GvH: Signal Score (-7.5): 0.870001 

Possible site: 27 
>» Seems to have a cleavable N-term signal seq. 
10 ALOM program count: 0 value: 11.88 threshold: 0.0 

PERIPHERAL Likelihood = 11.88 63 
modified ALOM score: -2.88 

*** Reasoning Step: 3 

15 

Final Results 

bacterial outside Certainty=0 . 3000 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

20 

SEQ ID 8628 (GBS159) was expressed in E.coli as a ffis-fosion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 28 (lane 4; MW 26kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 34 (lane 2; MW 41kDa). 

GBS159-GST was purified as shown in Figure 198, lane 9. 

25 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 643 

A DNA sequence (GBSx0683) was identified in S.agalactiae <SEQ ID 1991> which encodes the amino 

acid sequence <SEQ ID 1992>. Analysis of this protein sequence reveals the following: 

30 Possible site: 32 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0, 2668 (Affirmative) < suco 

35 bacterial membrane. Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB04699 GB:AP001510 unknown conserved protein [Bacillus halodurans] 
40 Identities - 32/76 (42%) , Positives = 54/76 (70%) 

Query: 7 LGSVIELKNDSQKVMITSRFPLYDNEGQLGYFDYSGCIFPISIVGNETYFFNLEDIDKVL 66 

+GS++ LK 4 K+MI +R P+ + G+ FDYSGC +P +V ++ ++FN E+ID+V+ 
Sb-ict: 4 IGSIVYLKEGTSKLMILNRGPILEANGENKMFDYSGCFYPQGLVPDKVFYFNHENIDEW 63 

45 

Query: 67 FEGYYDENEEEMQKIF 82 

FEG+ D+ E+ QK4F 
Sbjct: 64 FEGFQDDEEQRFQKLF 79 

50 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 644 

A DNA sequence (GBSx0684) was identified in S.agalactiae <SEQ ID 1993> which encodes the amino 
acid sequence <SEQ ID 1994>. Analysis of this protein sequence reveals the following: 

Possible site: 32 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-14.81 Transmembrane 75 - 91 ( 69 - 99) 
INTEGRAL Likelihood =-14.38 Transmembrane 134 - 150 ( 129 - 179) 
INTEGRAL Likelihood = -8.49 Transmembrane 157 - 173 ( 151 - 179) 
INTEGRAL Likelihood = -1.17 Transmembrane 50 - 66 ( 46 - 67) 

Final Results 

bacterial membrane Certainty=0. 6922 (Affirmative) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) <; suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 645 

A DNA sequence (GBSx0685) was identified in S.agalactiae <SEQ ID 1995> which encodes the amino 
acid sequence <SEQ ID 1996>. Analysis of this protein sequence reveals the following: 

Possible site: 3 5 

>» Seems to have no N-terminal signal sequence 

Likelihood = -0.11 Transmembrane 40 - 56 ( 40 - 56) 



Final Results 

bacterial membrane — Certainty=0. 1044 (Affirmative) < suco 
bacterial outside --- Certainty=o. 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 1996 (GBS204) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 49 (lane 13; MW 32kDa) and Figure 53 (lane 2; MW 14.7kDa). It was also 
expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 54 
(lane 6; MW 39.7kDa). 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 646 

A DNA sequence (GBSx0686) was identified in S.agalactiae <SEQ ID 1997> which encodes the amino 
acid sequence <SEQ ID 1998>. Analysis of this protein sequence reveals the following: 



- Final Results 

bacterial membrane --- Certainty=0.0O00 (Not Clear) . 
bacterial outside --- Certainty=0. 0000 (Not Clear) . 



WO 02/34771 



-733- 



PCT/GB01/04789 



bacterial cytoplasm Certainty=0 .0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAC16670 GB:AJ302698 hypothetical protein [Staphylococcus 
haemolyticus] 

Identities = 60/254 (23%) , Positives = 109/254 (42%) , Gaps = 14/254 (5%) 

Query: 2 VCTSVSSVGTQASTVAISMFSRVSALNDAITKL-SSFAEAA'TLQGTAYSNAKSYATGTLTP 61 

+ + V +Q+S V++SS+ +FA+LQGAY + K+ ++ P 

Sbjct: 3 IDMYVGKSKSQSSDVGSTVKS1SSGYDSLQKGIMQFVGASELCGQAYDSGKQFFSAVIAP 62 

Query: 62 MLQGMILFSETLSEKCTELQTLYVSICGDEDLDSWLESKLASDRASLKIAEALLEHLND 121 

+ + + E+C+ YS +L L+ + EA+ L 

Sbjct: 63 LTESIKTLGELTEQACNDFVDQYQSEVDSQSLKESELLEDIEELNKQISQLEAMNASLKH 122 

Query: 122 DPEPSKSAISSTKSNIKKLKKEIKSNQKK1DNI^FNAHSA^VFADIS^3AQSTVNQALAA 181 

+ S +S I L+++ K ++KL L +F4A S +F ++ + Q TV Q + 

Sbjct: 123 KSSKNSSLLSGNHQMISSLEQQKKELEEKLRKLRQFDAKSPNIFKEVESFQKTVQQGIHQ 182 

Query: 182 VSTGFSGYNSKTGAFGKPTSGQMEWTKTVIOCNWKEREDAI^AEELKSICKAEESKKASKIEN 241 

T ++ F P MEW K ++ E K +++ ++KA++ KK SK + 
Sbjct: 183 AKT AWDPGKQTFNI PAGKDMEWAKVSQQKALE VKMDKI -NQKAKDGKKLSKNDI 235 

Query: 242 TT KKSNV 248 

T KKSN+ 
Sbjct: 236 FTI IAYQQQKKSNI 249 

No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 1998 (GBS270) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 51 (lane 2; MW 34.3kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 54 (lane 7; MW 59.2kDa). 

The GBS270-GST fusion product was purified (Figure 206, lane 3) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 265), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 647 

A DNA sequence (GBSx0687) was identified in S.agalactiae <SEQ ID 1999> which encodes the amino 
acid sequence <SEQ ID 2000>. This protein is predicted to be outer surface protein F. Analysis of this 
protein sequence reveals the following: 

Possible site: 23 

>» Seems to have no N-termlnal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 3323 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
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SEQ ID 2000 (GBS316) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 51 (lane 3; MW 23kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 55 (lane 2; MW 41.8kDa). 

GBS316-GST was purified as shown in Figure 206, lane 4. 

5 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 648 

A DNA sequence (GBSx0688) was identified in S.agalactiae <SEQ ID 2001> which encodes the amino 
acid sequence <SEQ ID 2002>. This protein is predicted to be actin-like protein arp3 (act4). Analysis of this 
1 0 protein sequence reveals the following: 

Possible site: 17 

>>> Seems to have no N-terminal signal sequence 

Final Results 

15 bacterial cytoplasm --- Certainty=0 . 0217 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

20 No corresponding DNA sequence was identified in S. pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 649 

A DNA sequence (GBSx0689) was identified in S.agalactiae <SEQ ID 2003> which encodes the amino 
25 acid sequence <SEQ ID 2004>. This protein is predicted to be diarrheal toxin. Analysis of this protein 
sequence reveals the following: 

Possible site: 25 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -8.65 Transmembrane 65 - 81 ( 61 - 84) 
30 INTEGRAL Likelihood = -3.98 Transmembrane 89 - 105 ( 85 - 106) 

Final Results 

bacterial membrane Certainty=0 .4461 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

35 bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB15175 GB:Z99120 alternate gene name: yueA-similar to 
hypothetical proteins [Bacillus subtilis] 
40 Identities . = 452/1058 (42%), Positives = 664/1058 (62%), Gaps = 39/1058 (3%) 

Query: 98 OTMIFSITGYFKNRKQYKQDLQERIDSYHDYLSDKSIELQKLAKEQKRGQHYHYPTIEGL 157 

+T+I S YF+++ Q K+ ++R Y YL +K ELQ LA++QK+ +H+P+ E + 
Sbjct: 1 MTLITSTVQYFRDKNQRKKREEKRERVYKLYLDNKRICELQALAEKQKQVLEFHFPSFEQM 60 

45 

Query: 158 QEMADTYHHRIYEKTPLHFDFLYYRLGLGEVPTSYNIHYSQPERSGKK-DPLENEGYNLY 216 

+ + RI+EK+ D+L RLG G VP+SY I+S + + + DL + ++ 

Sbjct: 61 KYLTSEISDRIWEKSLESKDYLQLRLGTGTVPSSYEINMSGGDIANRDIDDLMEKSQHMQ 120 

50 Query: 217 FNNRYIKNMPIVANLSHGPVGYIGPRGLVLEQLQLM\ r NQLAFFHSYHDVQFITIVPEEEM 276 
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Sbjct: 


121 


Query: 


277 


Sbjct: 


181 


Query: 


337 


Sbjct: 


237 


Query: 


397 


Sb j ct : 


297 


Query: 
Sbjct: 


457 
356 


Query: 


517 


Sb j ct : 


416 


Query: 


575 


Sbjct: 


476 


Query: 


635 


Sbjct: 


536 


Query: 


695 


Sbjct: 


596 


Query: 


755 


Sbjct: 


648 


Query: 


814 


Sb j Ct : 


699 


Query: 


874 


Sbjct: 


759 


Query: 


934 


Sbjct: 


819 


Query: 


994 


Sbjct: 


877 




1054 


Sbjct: 


937 


Query: 


1111 


Sbjct: 


593 
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- I+N P+ +L+ GP+G +G +V ++ ++ QL+FF+SYHD++F+ I EEE 



+GF+YN+++RDQ+L+SL ++++ +R+ + KE F P 
ifAKGFIYNEQTRDQLLSSLYELIR ERDLEDDKEKLQFKP 236 



H+V ++T+++LI +HVI+E+ 



H! SRL Iffl + +SIPE V+F+E++ A+E + 



LSLAV+ FHPH+ AFLLIDYKGGGMA F+++PHLLGTITN++G++ 



VNHIN Y K YK G+ MPHLFLISDEFAELKS +P+F++ELVS ARI 



GRSLG+HLILATQKP G++DDQIWSNSRFK+ALKV D DS Eh-L DAA IT GR Y 



LQVGNNEVYELFQSAWSGADYQPEKDDQGIEDHTIYSIHDLGQYEILNDDLSGLDQAENI 754 
LQVGNNEVYELFQSAWSGA YE G ED I + D G LS +D +N 

LQVGNNEVYELFQSAWSGAPYLEEV- - YGTEDE - 1 AI VTDTGLI PLSEVDTEDNA 647 

-KEVPTELDAIVENIQALTKEMGISDLPQPWLPPI.SNQIAVTDLRKEGSVDLMSKAPSYK 813 

K+V TE++A+V+ 1+ + EMGI LP PWLPPL+ +1 T h+ 

KKDVQTEIEAWDEIERIQDEMGIEKLPSPWLPPLAER1PRT LFPSWEKDH 698 



S A T M A +PE L++Y 



+FDFG LLPL +LPH AD+F +D KI KF+ RIK E+ RK+ 



3 +P I I ID+++ + 



L+ NLK +1 L D SE ++ GR + +E IPGR +I+++++ Q+ L 



+P IP++PE Ii+ + 



There is also homology to SEQ ID 24. 

A related GBS gene <SEQ ID 8629> and protein <SEQ ID 8630> were also identified. Analysis of this 
protein sequence reveals the following: 
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Homology to a bacterial toxin 
The protein has homology with the following sequences in the databases: 

>OMNl|NT01BS3725 diarrheal toxin 

Score = 
Identitie 

Query: 1 MGISDLPQPWLPPLSNQIAVTDLRKEG3VDLWSKRPSYKAVLGFMDIPSQQAQEVAYHDF 60 

MGI LP PWLPPL+ +1 T L+ ++D P Q Q + 

Sbjct: 704 MGIEKLPSPWLPPLAERIPRT LFPSNEKBHFHFAYVDEPDLQRQAPIAYKM 754 

Query: 61 EDDGHLSIFAGPSMGKSTALQTVTMDIARHNSPEFIiHLYLFDFGTNGLLPLRRLPHVADF 120 

+DG++ IF GKS A T M A +PE L++Y+FDFG LLPL +LPH AD+ 

Sbjct: 755 MEDGNIGIFGSSGyGKSIAAATFI.MSFADVYTPEELHVYIFDFGNGTLLPLAKLPHTADY 814 

Query: 121 FTIDDDEKIAKFIARIKVEMSDRKKALSRYNVATAKLYRQVSGETMPQIL1VIDSYEGLR 180 

F +D KI KF+ RIK E+ RK+ ++ K+Y +S E +P I I ID+++ ++ 

Sbjct: 815 FLMDQSRKIEKFMIRIKEEIDRRKRLFREKEISHIKMYNALSEEELPFIF1TIDNFDIVK 874 

Query: 181 EAQTPTNLEACFQNISRDGSSLGISLVISAGRTAALRSSLMAHLKERIALKLTDDSESRT 240 

+ LE+ F +SRDG SLGI +++A R A+R SL+ NLK +1 L D SE + 

Sbjct: 875 DEM--HELESEWQLSRDGQSLGIYFMLTATRVmWQSLLHMLKTKIVHYLMDQSEGYS 932 

Query: 241 LVGRHQHIMEDIPGRGLIKRDDIEVLQVALSTEGTETFDIINNIQNESDAMNSKWTG-PR 299 

+ GR + +E IPGR Q+ L + + + N ++++ + ++ + 

Sbjct: 933 IYGRPKFNLEPIPGRVIIQKEELYFAQMFLPVDADDDIGMFNELKSDVQKLQGRFASMEQ 992 

Query: 300 PKAI PIVPEELTFDDFMATDSVQADLSANRL- - PLGLEMVDVESYSLALNRFKHMLYMSD 357 

P IP++PE L+ + S++ L L P+GL V L + KH L + 

Sbjct: 993 PAPIPMLPESLSTREL SIRFKLERKPLSVPIGLHEETVSPVYFDLGKHKHCLILGQ 1048 

Query: 358 SDESLEAVGSHI IKVLL 374 

+ ++++KV+L 
Sbjct: 1049 TQRG KTNVLKVML 1061 

SEQ ID 8630 (GBS326) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 65 (lane 5; MW 66kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 71 (lane 5; MW 91kDa). 

GBS326-GST was purified as shown in Figure 212, lane 5. 

GBS326LN was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell extract is 
shown in Figure 127 (lane 2-4; MW 114kDa). It was also expressed in E.coli as a GST-fusion product. 
SDS-PAGE analysis of total cell extract is shown in Figure 184 (lane 6; MW 1 14kDa). The purified protein 
is shown in Figure 236, lane 12. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 650 

A DNA sequence (GBSx0690) was identified in S.agalactiae <SEQ ID 2005> which encodes the amino 
acid sequence <SEQ ID 2006>. Analysis of this protein sequence reveals the following: 

d N- terminal signal sequence 

• Final Results 

bacterial cytoplasm Certainty=0. 2693 (Affirmative) < suoo 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 
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The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
5 vaccines or diagnostics. 

Example 651 

A DNA sequence (GBSx0691) was identified in S.agalactiae <SEQ ID 2007> which encodes the amino 
acid sequence <SEQ ID 2008>. Analysis of this protein sequence reveals the following: 

Possible site: 38 
10 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 3933 (Affirmative) < suco 
bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
15 bacterial outside --- Certainty=0. 0000 (Not Clear) .< suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
20 vaccines or diagnostics. 

Example 652 

A DNA sequence (GBSx0692) was identified in S.agalactiae <SEQ ID 2009> which encodes the amino 
acid sequence <SEQ ID 2010>. Analysis of this protein sequence reveals the following: 

Possible site: 55 
25 »> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -7.32 Transmembrane 225 - 241 ( 219 - 246) 

Final Results 

bacterial membrane Certainty=0 .3930 (Affirmative) < suco 

30 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB04693 GB:AP001510 unknown conserved protein [Bacillus halodurans] 
35 Identities = 83/320 (25%) , Positives = 162/320 (49%) , Gaps = 1/320 (0%) 

Query: 103 VNFILHPSNLFLTKNATAKIAYRSLPGIMRPEKFGPEEFLYQFKCFVFALLTQHDYIELy 162 

++ 1+ P N+ ++ + + + P + PE + + + LL + Y 

Sbjct: 106 LHLIVSPENVLVSDGLDVTFIHYGVKDSIPPYETDPERLFLELRATLLVLLDGNHRFHEY 165 

40 

Query: 163 NGAISVIEVSDFLKSIYHAETIQAVRDIITIDYEQQVEVETHTLAKVSRAKYKLYKYISV 222 

+++S KS+ T++ +R++I + Q+ E + L KV + K+ + K+ + 
Sbjct: 166 MNYHDTLKLSPFAKSLVQQTTLEGLRELIR-HWIQEHEQQEKQLHKVPKTKWTIQKWAGI 224 

45 Query: 223 WLGALSTILLIPLvYLVFIHNPFKKKMLAADTSFlKVDYNQVINRLEHVKVSKLPYTQKY 282 

LA +1 +VY++ P +E A+ +++ +Y+QVI+ LE + +P KY 

Sbjct: 225 GLIAALVPAIIYIVYVl^FLQPRQEaFTASHAAYLNENYSQVIDTLEPYSPNSMPRWKY 284 



Query: 283 ELAYSYINGMSFSEEQREVIIiNNVTLKTDELYLDYWINIGRGLDDDAIDAAKRLDDSDLV 342 
50 +LA SY+ RE + N + L+ E Y DYWI IGRG ++ AID A+ L D + + 

Sbjct: 285 QIAQSOTAIEPLQAYHRENLKNVLVLQAAESYFDYWIAIGRGENEKAIDIARGLQDKEWL 344 
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Query: 343 IYAIVQKMDQWKDNSLSGKDREQKLSELQTDYDKYWKDRKTALTDEESKSKNSNNHSTN 402 

+YA V++ ++V+ D +LSGK+RE + E4+ + D Y ++ + + E+ N+ ++N 
Sbjct: 345 VYANVKRREEVKSDEMLSGKEREDLIKEIEAEIDDYMRELEEIAEEGEAFQPNAEPAASN 404 

5 Query: 403 SNKESSESSSTTASTSSKTK 422 

+E + S + + K 
Sbjct: 405 ELEEDEGDTEEDDSDNQEAK 424 

No corresponding DNA sequence was identified in S.pyogenes. 

10 SEQ ID 2010 (GBS337) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 62 (lane 3; MW 50.3kDa). 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 653 

15 A DNA sequence (GBSx0693) was identified in S.agalactiae <SEQ ID 201 1> which encodes the amino 
acid sequence <SEQ ID 2012>. Analysis of this protein sequence reveals the following: 

Possible site: 27 

»> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood =-14.01 Transmembrane 131 - 147 ( 122 - 153) 

20 

Final Results 

bacterial membrane Certainty=0 . 6604 (Affirmative) < suco 

bacterial outside — Certainty=0.0000<Not Clear) < suco 
bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

25 

A related GBS nucleic acid sequence <SEQ ID 8631> which encodes amino acid sequence <SEQ ID 8632> 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 8 
McG: Discrim Score: 13.38 
30 , GvH: Signal Score (-7.5): -1.25 

Possible site: 23 
»> Seems to have a cleavable N-term signal seq. 
ALOM program count: 1 value: -14.01 threshold: 0.0 

INTEGRAL Likelihood =-14.01 Transmembrane 127 - 143 ( 118 - 149) 
35 PERIPHERAL Likelihood = 16.13 113 

modified ALOM score: 3.30 

*** Reasoning Step: 3 

40 Final Results 

bacterial membrane — Certainty=0 . 6604 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

45 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 8632 (GBS 140) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 32 (lane 3; MW 43kDa). It was also expressed in E.coli as a His-fusion product. 
SDS-PAGE analysis of total cell extract is shown in Figure 49 (lane 8; MW 18kDa). 

50 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 654 

A DNA sequence (GBSx0694) was identified in S.agalactiae <SEQ ID 2013> which encodes the amino 
acid sequence <SEQ ID 2014>. Analysis of this protein sequence reveals the following: 

Possible site: 15 
5 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1486 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
15 vaccines or diagnostics. 

Example 65S 

A DNA sequence (GBSx0695) was identified in S.agalactiae <SEQ ID 2015> which encodes the amino 
acid sequence <SEQ ID 2016>. Analysis of this protein sequence reveals the following: 



Possible site: 32 



25 





have an uncleavable N- 


term signal seq 










INTEGRAL 


Likelihood =-14.59 


Transmembrane 


984 


-1000 


976 


-1009 


INTEGRAL 


Likelihood = -9.71 


Transmembrane 


19 


- 35 


15 


- 42 


INTEGRAL 


Likelihood = -9.50 


Transmembrane 


872 


- 888 


B65 


- 090 


INTEGRAL 


Likelihood = -6.37 


Transmembrane 


S27 


- 943 




- 951 


INTEGRAL 


Likelihood = -4.19 


Transmembrane 


831 


- 847 


828 


- 847 


INTEGRAL 


Likelihood =- -2.87 


Transmembrane 


899 


- 915 


899 


- 916 



Final Results 

bacterial membrane --- Certainty=0 . S838 (Affirmative) < succ; 

30 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 8633> which encodes amino acid sequence <SEQ ID 8634> 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop Possible site: -1 Crend: 6 
SRCFLG: 0 

McG: Length of UR: 20 

Peak Value of UR: 3.40 

Net Charge of CR: 3 
McG: Discrim Score: 13.67 
GvH: Signal Score (-7.5): -3.27 

Possible site: 21 
»> Seems to have an uncleavable N-term signal seq 
Amino Acid Composition: calculated from 1 
ALOM program count: 6 value: -14.59 threshold: 0.0 



Likelihood =-14.59 
Likelihood = -9.71 
Likelihood = -9.50 
Likelihood = -6.37 
Likelihood = -4.19 
Likelihood = -2.87 
PERIPHERAL Likelihood = 3.82 
modified ALOM score: 3.42 
icml HYPID : 7 CFP: 0.684 

*** Reasoning Step: 3 



Transmembrane 973 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



989 ( 965 - i 

24 ( 4 - 

877 ( 854 - { 

932 ( 913 - 5 

835 ( 817 - i 

904 (888-5 
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Final Results 

bacterial membrane Certainty=0 . 6838 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Mot Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB86324 GB:AE000938 phage infection protein homolog 
[Methanothermobacter thermoautotrophicus] 
Identities = 96/454 (21%), Positives = 190/454 (41%), Gaps = 63/454 (13%) 

Query: 1 MLKIKyiLGRIMKR-^FRILWYIIAVALPLvAIAGLffliKLQGDHAKENKTTQSATNTKL 59 

M K I + MK N ++ ++IAV + + A+ + +Q ++T+ + 
Sbjct: 1 MRKALE I FWKDMKT VKNS P WLFVI AVI I CI PALYAV- FNIQATLDPYSRTSS 1 53 

Query: 60 NIALVNEDQMVSNGKESyNLGASYIKSIERDNSQNWSWSRGTAQNGLDKGDYQLMVIIP 119 

+A+VNED N+GA ++ + ++ + +W V R A +GL KG Y ++IIP 

Sbjct: 54 EVAVVNEDMGADFNGTHLNVGAEFVSELRKNRNFDWQFTORSDAMDGI^KGKYYAVLIIP 113 

Query: 120 MFSQKLLDWKANAEQTTISY»/NAKGN1^3^ 179 

NFS LL + Q +1 Y VN K N + + +++NS +V + 

Sbjct: 114 GNFSSDLLSIKNGTPRQASIKYMVITOKIiNPVAPRITmGADALQAKINSEVVKTIDGIVF 173 

Query: 180 SNLYTAQENVQA MVNVQSGNI SNYQKNLLDSATNF QNI FPAL 221 

+ A E +A VN +GN+ + L + ++ QN++ +L 

Sbjct: 174 GKISFAGEIARANRDDILRTKRFVNELNGNLGKIDETLSTANSDLEKGQNLWSSLKTDLP 233 

Query: 222 -VNQSSSSITANESLKKS LEASDNMFNDLVTTQTNTGKDLSSL 263 

+ +++ + SL +S +++ ++ ++ +T+ L+SL 

Sbjct: 234 EIRDNANFVKEKYSLLESYIGKDPAKALSTVQSKESHLSEAITSMKYLRAVLASLYSATG 293 

Query: 264 IEQRHQDSISYEAFSTSLLEMNNELLEKQLSDIITQAQKDQETLSSQLNSIMG 316 

I+Q + + L + ++L K +D I + + + + S LN +M 

Sbjct: 294 DPKLKTAIDQIDTNIEKASSVLGILQTIESDLKTKGTTDRIVKLKASIDRMDSALNKLMD 353 

Query: 317 D-DNNHNHKENSSAYLNVARQKIQELSEALKSQDNIAKDQSEQLDKIVREGLASYFAKNN 375 

D +++SA Jj +A + + A+ +D S +L+ I + L S + 

Sbjct: 354 SRDEIDAAMQDASAKLGIANARWPTMRSAI QDASRKLNMISDDDLNSLVKLAD 406 

Query: 376 KDNITLLELLKSHSTNEK TLKDFKAKVADF 405 

D + E +S EK +K++ + +A F 

Sbjct: 407 IDPSAVREYFRSPVRMEKEHIYPVKNYGSALAPF 440 

SEQ ID 8634 (GBS250) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 47 (lane 4; MW 136kDa). 

GBS250-GST was purified as shown in Figure 203, lane 4. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 656 

A DNA sequence (GBSx0696) was identified in S.agalactiae <SEQ ID 2019> which encodes the amino 
acid sequence <SEQ ID 202O. Analysis of this protein sequence reveals the following: 

Possible site: 39 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 5009 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA46375 GB:X65276 ORFA1 [Clostridium acetdbutylicum] 
Identities = 35/91 (38%) , Positives = 53/91 (57%) 

5 Query: 1 ^QIKLTPEELRSSAQKYTAGSQQVTEVLNLLTQ3QAVIDENWDGSTFDSFEAQENELSP 60 

MAQI +TPEEL+S AQ Y +++ + ++ +IEWGF++ Q+M+L 
Sbjct: 1 MAQISVTPEELKSQAQVYIQSKEEIDQAIQKVNSMNSTIAEEWKGQAFQAYLEQYNQLHQ 60 

Query: 61 KITEFAQLLEDINQQIiLKVADI IEQTDADIA 91 
10 + +F LLE +NQQL K AD + + DA A 

Sbjct: 61 TWQFENLLESVNQQLNKYADTVAERDAQDA 91 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
15 vaccines or diagnostics. 

Example 657 

A DNA sequence (GBSx0697) was identified in S.agalactiae <SEQ ID 2021> which encodes the amino 
acid sequence <SEQ ID 2022>. Analysis of this protein sequence reveals the following: 

Possible site: 22 
20 >>i Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0. 3741 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

25 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
30 vaccines or diagnostics. 

Example 658 

A repeated DNA sequence (GBSx0698) was identified in S.agalactiae <SEQ ID 2023> which encodes the 
amino acid sequence <SEQ ID 2024>. This protein is predicted to be carbamoylphosphate synthetase 
(carB). Analysis of this protein sequence reveals the following: 

35 Possible site: 23 

>» Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -1.33 Transmembrane 807 - 823 ( 807 - 823) 

Final Results 

40 bacterial membrane — Certainty=0 . 1532 (Affirmative) <: suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

45 >GP:CAA03928 GB:AJ000109 carbamoylphosphate synthetase [Lactococcus 

lactis] 

Identities = 771/1062 (72%), Positives = 901/1062 (84%), Gaps = 5/1062 (0%) 

Query: 1 MPKRTDIRKI^W'IGSGPIVIGOA^FDYSGTC^CLSLKEEGYQVVLVNSNPATIMTDKDI 60 
50 MPKR DI+KIM+IGSGPI+IGQAAEFDY+GT+ACL+LKEEGY+WLVNSNPATIMTD++I 

Sbjct: 1 MPKRNDIKKIMIIGSGPIIIGQAAEFDYAGTEACLftLKEEGYEVVLVNSNPATIMTDREI 60 
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Query: 61 ADKVYIEPITLEFVTRILRKERPDALLPTLGGQTGL1#IAMAL£KNQILEELNVELLGTKL 120 

AD WIEPITLEFV++ILRKERPD2UjLPTLGGQTGLNMAM LSK GILEELNVELLGTKL 
Sbjct: 61 ADTVYIEPITLEWSKILRKERPDALLPTLGGQTGLNMAMELSKTGILEEIjNVELLGTKL 120 

Query: 121 SAIDKAEDRDLFKQLMEELNQPIPESEIVNS'/EEAIQFAEQIGYPLIVRPAFTLGGTGGG 180 

SAID+AEDR+LFK+L E +N+P+ S+I +VEEAI A++IGYP+IV PAFT+GGTGGG 
Sbjct: 121 SAIDQAEDRELFKELCES INEPLCASDIATTVEEAINIADKIGYPI IVGPAFTMGGTGGG 180 

Query: 181 MCDNQEQLVDITTKGLKLSPVTQClilERSIAGFKEIEYEVMRDAADNALWCHMEIIFDPV 240 

+CD +E+L +1 GLKLSPVTQCLIE SIAG+KEIEYEVMRD+ADNA+WCNMENFDPV 
Sbjct: 181 ICDTEEELREIVANGLKLSPVTQCLIEESIAGYKEIEYEVMRDSADNAIWCNMENFDPV 240 

Query: 241 GIHTGDSIVFAPAQTLSDVENQLLRDASLDIIRALKIEGGCNVQLALDPNSFKYYVIEVN 300 

G+HTGDS IVFAP+QTLSD E Q+LRDASL+IIRALKIEGGCNVQLALDP1JS++Y VIEW 
Sbjct: 241 GVHTGDSIVFAPSQTLSDNEYQMLRDASI.NIIRALKIEGGCNVQLALDPNSYEYRVIEYN 3 00 

Query: 301 PRVSRSSALASKATGYPIAKLAAKIAVGLTLDEVINPITKTTYAMFEPALDYWAKMPRF 350 

PRVSRSSALASKATGYPIAK++AKIA+G+TLDE+INP+T TYAMFEPALDYWAK+ RF 
Sbjct: 301 PRVSRSSALASKATGYPIAKMSAKIAIGMTLDEIINPVTNKTYAMFEPALDYWAKIARF 3 60 

Query: 361 PFDKFESGDRKLGTQMKATGEVMAIGRKIKESLLKACRSLEIGVDHIKIADLDNVSDDVL 420 

PFDKFE+GDR LGTQMKATGEVMAIGRNIEESLLKA RSLEIGV H ++ + D+ L 

Sbjct: 361 PFDKFENGDRHLGTQMKATGEVmiGRN-IEESLLKAVRSLEIGVFHWEMTEAIEADDEKL 420 

Query: 421 LEKIRKMDDRLFYLAEALRRHYSIEKLASLTSIDSFFLDKLRVIWLEDLLSKNRLDIN 4SO 

EK+ K +DDRLFY+ +EA+RR IE++A LT ID FFLDKL IVE+E+ L N + 
Sbjct: 421 YEKMVKTQDDRLFYVSEAIRRGIPIEEIADLTKIDIFFLDKLLYIVEIENQLKVNIFEPE 480 

Query: 481 ILKKA/KHKGFSDKAIASLWQINEDQVRNMRKEAGILPVYKMVDTCASEFDSATPYFYSTY 540 

+LK K GFSD+ IA LW + ++VR R+E I+PVYKMVDTCA+EF+S+TPYFYSTY 
Sbjct: 481 LLKTAKKNGFSDREIAKLW^JV^PEEVRRRRQENKIIPVYK^nroTCAAEFESSTPYFYSTY 540 

Query: 541 AVENESLISDKASILVLGSGPIRIGQGVEFDYATVHSVKAIRESGFEAIIMNSNPETVST 600 

ENES SDK I +VLGSGP IRIGQGVEFDYATVH VKAI+ G EAI++NSNPETVST 
Sbjct: 541 EWEI^SKRSDKEKIIVLGSGPIRIGQGVEFDYATVHCVKAIQAI^KEAIVINSNPETVST 600 

Query: 601 DFSISDKIYFEPLTFEDVMNVIDLEKPEGVILQFGGQTAINLAKDLMKAGVKILGTQLED 660 

DFSISDKLYFEPLTFEDVMNVIDLE+P VI+QFGGQTAINLA+ L+KAGVKILGTQ+ED 
Sbjct: 601 DFSISDKLYFEPLTFEDVMNVIDLEEPLWIVQFGGQTAINLAEHLSKAGVKILGTQVED 650 

Query: 661 LDRAENRKQFEATLQALNIPQPPGFTATTEEEAVNAAQKIGYPVLVRPSYVLGGRAMKIV 720 

LDRAE+R FE LQ L+IPQPPG TAT EEEAV A KIGYPVL+RPS + VLGGRAM+ 1 + 
Sbjct: 661 LDRAEDRDLFEKALQDLDIPQPPGATATNEEEAVANANKIGYPVLIRPSFVLGGRAMEII 720 

Query: 721 ENEEDLRHYMTTAVKASPDHPVLIDAYLIGKECEVDAISDGQNILIPGIMEHIERSGVHS 780 

NE+DLR YM AVKASP+HPVL+D+YL G+ECEVDAI DG+ +L+PGIMEHIER+GVHS 
Sbjct: 721 1©[EKDLRDYMNRAVKASPEHPVLVDSYLQGQECEVDAICDGKEVLLPGIMEHIERAGVHS 780 

Query: 781 GDSMAVYPPQTLSETIIETIVDYTXRIAIGLNCIGWMIQmKDQKVYVIEVNPRASRT 840 

GDSMAVYPPQ LS+ II+TIVDYTKRLAIGLNCIGMMNIQFVI +++VYVIEVNPRASRT 
Sbjct: 781 GDSMAVYPPQWLSQAIIDTIVDYTKRLAIGLNCIGKMNIQFVIYEEQVYVIEVNPRASRT 840 

Query: 841 LPFLSKOTHIPMAQVATKVILGDKLCNFTYGYDLYPASDI^IKAPVFSFTKLAKVDSLL 900 

+PFLSKVT+IPMAQ+AT++ILG+ L + Y LP DMVH+KAPVFSFTKLAKVDSLL 
Sbjct: 841 VPFLSKVTNIP^QLATQMILGEKLKDLGYEAGLAPTPDMVHVKAPVFSFTKLAKVDSLL 900 

Query: 901 GPEMKSTGEVMGSDINLQKALYKAFEAAYLHMPDYGKIVFTVDDTDKEEALEIiAKVYQSI 9S0 

GPEMKSTG MGSD+ L+KALYK+FEAA LHM DYG+++FTV D DKEE L LAK + I 
Sbjct: 901 GPEMKSTGLAMGSDA7TLEKALYKSFEAAKLHMADYGSVLFTOADEDKEETLALAKDFAEI 960 

Query: 961 GYRIYATQGTAIYFDANGLETVLVGKL — GENDRNHIPDLIKNGKIQAVINTVGQKNID- 1017 

GY + AT GTA + NGL V KL GE++ + + 1+ G++QAV+NT+G 
Sbjct: 961 GYSLVATAGTAAFLKENGLYTOEVEKIAGGEDEEGTLVEDIRQGRVQAVVNTMGNTRASL 1020 



Query: 1018 - -HHDALI IRRSAIEQGVPLFTSLDTAHAMFK 

D IR+ AI +G+PLFTSLDT A+ KV++SR+FT K 
Sbjct: 1021 TTATDGFRIRQEAISRGIPLFTSLDTVAAIIiKVMQSRSFTTK 1062 
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A related DNA sequence was identified in S.pyogenes <SEQ ID 2025> which encodes the amino acid 
sequence <SEQ ID 2026>. Analysis of this protein sequence reveals the following: 

Possible site: 21 
>>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -1.17 Transmembrane 773 - 789 ( 773 - 789) 

Final Results 

bacterial membrane Certainty=0.14S8 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAA03928 GB:AJ000109 carbamoylphosphate synthetase [Lactococcus 
lactis] 

Identities = 753/1030 (73%) , Positives = 876/1030 (84%) , Gaps = 6/1030 (0%) 

LALKEEGYKVILVNSNPATIMTDKEIADKVYIEPLTLEFVNRI IRKERPDAILPTLGGQT 6 0 
LALKEEGY+V+LVNSNPATIMTD+EIAD VYIEP+TLEFV++I+RKERPDA+LPTLGGQT 
LALKEEGYEWLVNSNPATIMTDREIADTVYIEPITLEFVSKILRKERPDALLPTLGGQT 94 

GLNMAMALSKAGILDDLEIELLGTKLSAIDQAEDRDLFKQLMQELDQPIPESTIVKTVDE 12 0 
GLNMAM LSK GIL++L +ELLGTKLSAIDQAEDR+LFK+L + +++P+ S I TV+E 
GIjNMAMELSKTGILEELW/ELLGTKLSAIDQAEDRELFKELCESINEPLCASDIATTVEE 154 



Query: 


1 
35 


Sbjct: 
Query: 


61 


Sbjct: 


95 


Query: 


121 


Sbjct: 


155 


Query: 


181 


Sbjct: 


215 


Query: 


241 


Sbjct: 


275 


Query: 


301 


Sbjct: 


335 




361 


Sbjct: 


395 




421 


Sbjct: 


455 




481 


Sbjct: 


515 


Query: 


541 


Sbjct: 


575 




601 


Sbjct: 


635 


Query: 


661 


Sbjct: 


695 




721 



A+ A IGYP+IV PAFT+GGTGGGIC +EEEL EI NGLKLSPVTQCLIE SIAG+K 



EIEYEVMRDSADNA+WCNMENFDPVG+HTGDSIVFAP+QTLSD e qmlrdasl iira 



I.KIEGGCNVQLALDP S++Y VIEWPRVSRSSALASKATGYPIAK++AKIA+G+TLDE+ 



INP+T TYAMFEPALDYWAKT RFPFDKFE+G+R LGTQMKATGEVMAIGRN+EESLL 



KA RSLEIGV HNEMT DE+L K++K QDDRLFY+SEAIRRG IEE+ LTKI 



D+FFLDKLL+IVEIE +L++++ E LK AK+ GFSD++IA++W 



LYPVYKMVDTCAAEFDAKTPYFYSTYELENESVQSNKES ILVLGSGPIRIGQGVEFDYAT 54 0 
4 PVYKMVDTCAAEF++ TPYFYSTYE ENES +S+KE I+VLGSGPIRIGQGVEFDYAT 
IIPVYKMVDTCAAEFESSTPYFYSTYEKENESKRSDKEKIIVLGSGPIRIGQGVEFDYAT 574 



VH VKAIQ G EAI++NSNPETVSTDFS+SDKLYFEPLTFEDVMNVIDLE+P VIVQF 



GGQTAINLA+ LS+AGV ILGTQVEDLDRAEDRDLFEKAL++L IPQP G TATNEEEA+ 



A KIG+PVL+RPS+VLGGRAMEI + N++DLR+Y+ AVKASPEHP+LVDSY+ G+ECE 
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Query: 781 GMMNVQFVIKNEQVWIEWNPI^SRTVPFLSKVTGIPMAQIATKLILGQTLKDLGYEDGL 840 

GMMN+QFVI EQVYVIEVNPRASRTVPFLSKVT IPMAQ+AT++ILG+ LKDLGYE GL 
Sbjct: 815 GMMNIQFVIYEEQWVIEWPRASRTVPFLSKVTNIPMAQLATQMILGENLKDLGYEAGL 874 

Query: 841 yPQSPLVHIKAPVFSFTf<IAQ\T)3LLGPEMKSTGEVMG3DTSLEKALYKAFEANNSHLSE 900 

P +VH+KAPVFSFTKIA+VDSLLGPEMKSTG MGSD 4-LEKALYK+FEA H+++ 
Sbjct: 875 APTPDMVHVKAPVFSFTKLAK\T33LLC-P3MKSTGL^1GSDVTLEKALYKSFEAAKLH^ 934 

Query: 901 FGQIVFTIADDSKAEALSLARRFKAIGYQIMATQGTAAYFAEQGLSACLVGKIGDAANDI 960 

+G ++FT4-AD+ K E L+LA+ F IGY ++AT GTAA+ E GL V K+ ++ 
Sbjct: 935 YGSVLFTVADEDKEETLALAKDFAE I GYSLVAXAGTAAFLKENGLYVREVEKLAGGEDEE 994 



Query: 961 
Sb j ct 



PTLV RHGHVQAIVNTVGIKR TADKDGQMIRSSAIEQGVPIiFTALDTAKAMLTVL 1014 

TLV R G VQA+VNT+G R T DG IR AI +G+PLFT+LDT A+L V+ 
'95 GTLVEDIRQGRVQAW1<ITMGNTRASLTTATDGFRIRQEAISRGIPLFTSI,DTVAAILKVI4 1054 



Query: 1015 ESRCFNIEAI 1024 

+SR F +1 
Sbjct: 1055 QSRSFTTKNI 1064 
Identities - 141/389 (36%) , Positives = 222/389 (56%) , Gaps ■= 





518 


Sbjct: 


8 


Query: 


578 


Sbj ct: 


68 




632 


Sb j ct : 


120 




692 
188 


Sbjct: 


750 


Sbjct: 


243 


Query: 


808 


Sbjct: 


303 




862 


Sbjct: 


368 



)KIGYPIIVGPAFTMGGTGGGICDTEEE 187 



VIEVNPR SR+ 



308 ALASKATGYPIAKMSAKIAIGMTLDI 



D LG +MK+TGEVM 



--PQSPIA7HIKAPVFSFTKLAQ 861 



CTYAMFEPALDYWAKIARFPFDKFEN 367 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 777/1025 (75%) , Positives = 896/1025 (86%) , Gaps = 1/1025 (0%) 

Query: 35 LSLKEEGYQWLVNSNPATIMTDKDIADKVYIEPITLSFVTRILRKERPDALLPTLGGQT 94 

L+LKEEGY+V+LVNSNPATIMTDK+IADKVYIEP+TLEFV RI+RKERPDA+LPTLGGQT 
Sbjct: 1 IALKEEGYKVILVNSNPATIMTDKEIADKVYIEPLTLEFVNRIIRKERPDAILPTLGGQT 60 

Query: 95 GLM^AMALSKNGILEELNVELIX^KLSAIDKAEDRDLFKQLMEEraQPIPESEIWSVEE 154 

GLMMAMALSK GIL++L +ELLGTKLSAID+AEDRDLFKQLM+EL+QPIPES IV +V+E 
Sbjct: 61 GLl^IAMALSICaGItiDDLEIELLGTKLSAIDQAEDRDLFKQLMQELDQPIPESriVKTVDE 120 

Query: 155 AIQFAEQIGYPLIVRPAFTLGGTGGGMCDNQEQLVDITTKGLKLSPVTQCLIERSIAGFK 214 

A+ FA IGYP+IVRPAFTLGGTGGG+C +4-E+L +IT GLKLSPVTQCLIERSIAGFK 
Sbjct: 121 AVTFARDIGYPVIVRPAFTLGGTGGGICESEEELCEITENGLICLSPVTQCLIERSIAGFK 180 
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Query: 215 EIEYEVMRDAADNALWCMMENFDPVGIHTGDSIVFAPAQTLSDVENQLLRDASLDIIRA 274 

EIEYEVMRD+ADNALWCNME3OTPVGIHTGDSIVFAP QTLSD+ENQ+LRDASL IIRA 
Sbjct: 181 EIEYEVMRDSADNALWCNMENFDPVGI HTGDS I VFAPTQTLSDIENQMLRDASLKI IRA 240 

Query: 275 LKIEGGC^QUUJDPNSFK^WIKWPRVSRSSAIjASKATCYPIAKLAAKIAVGLTLDEV 334 

LKIEGGCNVQIALDP SFICYYVIEVNPRVSRSSA1ASKATGYPIAKLAAKIAVGLTLDE+ 
Sbjct: 241 LKIEGGCNVQIALDPYSFKYWIEWPRVSRSSALASKATGYPIAKLAAKIAVGLTLDEM 300 

Query: 335 INPITKTTYAMFEPALDYWAKIIPRFPFDKFESGDRKIiGTQMKATGEVMAIGRNIEESLL 394 

INPIT TTYAMFEPALDYWAK+PRFPFDKFE G+R+LGTQMKATGEVMAIGRN+EESLL 
Sbjct: 301 INPITGTTYAMFEPALDYWAKIPRFPFDKFEHGERQLGTQMKATGEVMAIGRNLEESLL 360 

Query: 395 KACRSLEIGVDHIKIADLDIWSDDVLLEKIRKAEDDRLFYIjREALRRHYSIEKLASLTSI 454 

KACRSLEIGV H ++ L N+SD+ L+ K+ KA+DDRLFYL+EA+RR YSIE+L SLT I 
Sbjct: 361 KACRSLEIGVCHNEMTSLSNISDEELVTKVIKAQDDRLFYLSEAIRRGYSIEELESLTKI 420 

Query: 455 DSFFLDKLRVIVELEDLLSKNRLDINILKKVKNKGFSDKAIASLWQINEDQVRNMRKEAG 514 

D FFLDKL 1VE+E L + + LKK K GFSD+ IA +WQ +E +R MR 
Sbjct: 421 DLFFLDKLLHIVEIEQELQMHVDHLESLKKAKRYGFSDQKIAEIWQKDESDIRAMRHSHS 480 



Sbjct: 481 LYPVYKMVDTCAAEFDAKTPYFYSTYELENESVQSNKESILVLGSGPIRIGQGVEFDYAT 540 

Query: 575 VHSVKAIRESGFEA1 IMNSNPETVSTDFSI SDKLYFEPLTFEDVMNVIDLEKPEGVILQF 634 

VHSVKAI+++G+EAIIMNSNPETVSTDFS+SDKLYFEPLTFEDVMNVIDLE+P+GVI+QF 
Sbjct: 541 VHSVKAIQKAGYEAIIMNSNPETVSTDFSVSDKLYFEPLTFEDVMNVIDLEQPKGVIVQF 600 

Query: 635 GGQTAINLAKDLNKAGVKILGTQLEDLDRAENRKQFEATLQALNIPQPPGFTATTEEEAV 694 

GGQTAINLA+ L++AGV ILGTQ+EDLDRAE+R FE L+ L IPQP G TAT EEEA+ 
Sbjct: 601 GGQTAINLAQALSEAGVTILGTQVEDIiDRAEDRDIiFEKALKELGIPQPQGQTATNEEEAL 660 

Query: 695 NAAQKIGYPVLWPSYVLGGRAMKIVEISMEDIiRHYMTTAVKASPDHPVLIDAYLIGKECE 754 

AA+KIG+PVLVRPSYVLGGRAM+IVEN+EDLR Y+ TAVKASP+HP+L+D+Y+' GKECE 
Sbjct: 661 EAAKKIGFPVLWPSYVLGGRAMEIVENKEDLREYIRTAVKASPEHPILVDSYIFGKECE 720 



VDAISDG+++LIPGIMEHIER4GVHSGDSMAVYPPQ LS+ I ETI +YTKRLAIGLNCI 



YP S +WIKAPVFSFTKLA+VDSLLGPEMKSTGEVMGSD +L+KALYKAFEA H+ + 



Query: 755 
Sbjct: 721 

Query: 815 GMMNIQFVIKDQK^WIEWPRASRTLPFLSiOTHIPMAQVATKVILGDKLCNFTYGYDL 874 

GMMN+QFVIK+++VYVIEVNPRASRT+PFLSKVT IPMAQ+ATK+ILG L + Y L 
Sbjct: 781 GMMNVQFVIKNEQVYVIEVNPRASRTVPFLSKVTGIPMAQIATKLILGQTLKDIjGYEDGL 840 

Query: 875 

Sbjct: 841 

Query: 935 YGNIVFTVDDTDKEEALELAKVYQSIGYRIYATC^TAIYFDANGLETVLVGKLGENDRKH 994 

+G IVFT+ D K EAL LA+ ATQGTA YF GL LVGK+G+ N 

Sbjct: 901 FGQIVFTIADDSKAEALSLARRFKAIGYQIMATQGTAAYFAEQGLSACLVGKIGD-AAiro 959 

Query: 995 IPDLIKKGKIQAVIMVGQNN1DNHDALIIRRSAIEQGVPLFTSLDTAHAMFKVLESRAF 1054 

IP L+++G +QA++NTVG + D +IR SAIEQGVPLFT+LDTA AM VLESR F 

Sbjct: 960 IPTLVRHGH^/QAIVHTVGIKRTADKIJGQ.^IRSSAIEQGVPLFTALDTAKAMLTVLESRCF 1019 

Query: 1055 TLKVL 1059 



Query: 10 IMVIGSGPIVIGQAAEFDYSGTQACLSLKEEGYQWLVNSNPATIMTDKDIADKVYIEPI 69 

I+V+GSGPI IGQ EFDY+ + ++++ GY+ +++NSNP T+ TD ++DK+Y EP+ 
Sbjct: 520 ILVLGSGPIRIGQGVEFDYATVHSVKAIQKAGYEAIIMNSNPETVSTDFSVSDKLYFEPL 579 

Query: 70 TLEFVTRlBRKERPDALLPTLGGQTGtNMAMAI^KNGIIiEELNVEIjLGTKLSAIDKAEDR 129 

T E V ++ E+P ++ GGQT +N+A ALS+ G V +LGT++ +D+AEDR 

Sbjct: 580 TFEDVMNVIDIiEQPKGVIVQFGGQTAINIAQALSEAG VTILGTQVEDL0RAEDR 633 
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Query: 


130 


Sbjct: 


634 


Query: 


190 


Sbjct: 


694 


Query: 


250 


Sbjct: 


752 




310 


Sbjct: 


810 




370 


Sbjct: 


864 



h YVIEVNPR SIM- 



IS P+A++A K+ +G TL + 



5 +MK+TGEVM 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 659 

A DNA sequence (GBSx0699) was identified in S.agalactiae <SEQ ID 2027> which encodes the amino 
acid sequence <SEQ ID 2028>. This protein is predicted to be carbamoyl phosphate synthetase small 
subunit (carA). Analysis of this protein sequence reveals the following: 
Possible site: 19 

>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 2401 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < succi 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB89872 GB:AJ132624 carbamoyl phosphate synthetase small 
subunit [Lactococcus lactis] 
Identities =242/355 (68%), Positives = 305/355 (85%) 





2 


Sbjct: 


3 




62 


Sb j ct : 


63 






Sb j ct : 


123 




182 


Sbjct: 


183 


Query: 


242 


Sbjct: 


243 


Query: 


302 



G+NRDDYESI PTCK W++E A PSNWR QM+ DEFLK K IPGI+G+DTRA+TKI+R 



SILRELS+R+C++TWP+ T+A+EIL + PDGV+L+NGPG+P +P A++MI+E+QGKIP 
S ILRELSKRECNLTWPVNTSAKE ILEMEPDG VMLTNGPGDPTDVPEAIEMI KEVQGKI P 2 

IFGICMGHQLFAKANGAKiyKMTFGHRGFNHAVRHLQTGQVDFTSQNHGyAVSREDFPEA 3 
IFGIC+GHQLF+ ANGA TYKM FGHRGFNHAVR + TG++DFTSQNHGYAVS E+ PE 
! I FGI CLGHQLFSIANGATTyKMKFGHRGFNHAWEVATGRIDFTSQNHGyAVSSENLPED 3 

Query: 302 LFITHEEINDKTVEGVRHICYYPAFSVQFHPDAAPGPHDTSYLFDEFINMIDDFQQ 356 
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L ITH EIND +VEGVFJKKY+PAFSVQFHPDAAPGPHD SYLFD+F++++D+F++ 
Sbjct: 303 LMITHVEIKDNSVEGVRHKYFPAFSVQFHPDAAPGPHDASYLFDDFMDLMDNFKK 357 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2029> which encodes the amino acid 
sequence <SEQ ID 2030>. Analysis of this protein sequence reveals the following: 

Possible site: 43 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3534 (Affirmative) < suco 

bacterial membrane Certainty=D. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 265/354 (74%) , Positives = 309/354 (86%) 

KRLLLLEDGSVFEGEAFGADVETSGEIVFSTGMT3YQESITDQSYNGQIITFTYPLIG1QY 61 
KRLL+LEDG++FEGE FGAD++ +GEIVF+TGMTGYQESITDOSYNGQI+TFTYPLIGNY 
KRLLILEDGTIFEGEPFGADIDVTGEIVFNTGMTGYQESITDQSYNGQILTFTYPLIGNY 62 

GINRDDYESIRPTCKGWIYEWAEYPSNWRQQMTLDEFLKLKGIPGISGIDTRALTKIIR 121 
GINRDDYESI PTCKGW+ E + SNWR+QMTLD FLK+KGIPGISGIDTRALTKIIR 





2 


Sbjct: 


3 


Query: 


52 


Sbjct: 


63 








123 


Query: 


182 


Sbjct: 


183 






Sbjct: 


243 




302 


Sbjct: 


303 



L +THE+INDKTVEGV+H+ +PAFSVQFHPDAAPGPHD SYLFDEF+ MID ++ 
LMVTHEDINDKrVEGVKHRDFPAFSVQFHPDAAFGPHDAS YLFDEFLEMIDSWR 356 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines, or diagnostics. 

Example 660 

A DNA sequence (GBSx0700) was identified in S.agalactiae <SEQ ID 2031 > which encodes the amino 
acid sequence <SEQ ID 2032>. This protein is predicted to be aspartate carbamoyltransferase (pyrB). 
Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

50 Final Results 

bacterial cytoplasm --- Certainty=0. 3260 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

55 The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF72727 GB:AF264709 aspartate transcarbamoylase [Enterococcus 
faecalis] 

Identities = 197/303 (65%) , Positives = 250/303 (82%) 
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TQTLSLEHFVSLEELSNQEVMSLIKRSISVKENPSNIGFDKDYYVSNLFFENSTRTHKSF 64 
++ +SL+H ++ E L+++EVM LI+R+ E K+ ++ Y+ +NLFFENSTRTHKSF 

SERISLKHLLTAEaLTDREVNlGLIRRAGEFKCGAIO-fflFEERQYFATNLFFENSTRTHKSF 64 



E+AE KLGL+ IEF A SSV KGETLYDT+LTMSA+G+DV VIRH +YY ELI S 



+ 1 +NGGDGSGQHP+Q LLDL+TIYEEFG F+GLK+AIVGD+THSRVAKSHMQ+L RL 



AIIMHPAPVNRDVE+A +LVE+ ++RIV QMSNGV+ R+AILEA+L 



A related DNA sequence was identified in S.pyogenes <SEQ ID 2033> which encodes the amino acid 
sequence <SEQ ID 2034>. Analysis of this protein sequence reveals the following: 

i cleavable N-term signal seq. 



Query: 


5 


Sbjct: 


5 


Query: 


65 


Sbjct: 


65 


Query: 


125 


Sbjct: 


125 


Query: 


135 


Sbjct: 






245 


Sbjet: 


245 


Query: 


305 


Sbjct: 


305 



Final Results 

bacterial outside --- Certainty=0 .3000 (Affirmative) < suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 208/300 (69%) , Positives = 249/300 (82%) 
Sbj. 
Sbjct 
Sbjct 
Sbjct 
Sbjct 



68 ELKLGLKTIEFNADTSSVNKGETLYDTILTMSALGLDVCVIRHPDIDyYKELIASPNIHS 127 

E KLGL 4-+FNAD S+VNKGE+LYDT+DTMSALG D+CVIRHP+ DYYKEL+ SP I + 
36 EKOiGLT^DFNMJASAVNKGESLYDl^TMSALGTDICVIRHPEDDYYKELVESPTITA 145 

128 AIVNGGDGSGQHPSQSLLDLVTIYEEFGYFKGLKIAIVGDLTHSRVAKSNMQVLKRLGAE 187 

+ 1 VNGGDGSGQHPSQ LLDL+TIYEEFG F+GLKIAI GDLTHSRVAKSNMQ+LKRLGAE 
146 SIVNGGDGSGQHPSQCLLDLLTIYEEFGRFEGLKIAIAGDLTHSRVAKSNMQILKRLGAE 205 

188 IFFSGPKEWYSSQFDEYGQYLPIDQLVDQIDVLMLLRVQHERHDGKGVFSKESYHQQFGL 247 

4+F GP+EWYS F+ YG Y+ IDQ++ ++DVLMLLRVQHERHDG FSKE YHQ FGL 
206 LYFYGPEEWYSEAFNAYGTYIAIDQI I KELDVLMLLRVQHERHDGHQS FSKEGYHQAFGL 265 

248 TKERYKHLRDTAIIMHPAPVNRDVEIASDLVFJUDKARIVKQMSNGVYARIAILEAVLNSR 307 

T+ERY+ L+D+AIIMHPAPVNRDVEIA LVEA KARIV QM+NGV+ R+AI+EA+LN R 
266 TQERYQQLKDSAIIMHPAPVNRDVEIADSLVEAPKARIVSQMANGVFVRMAIIEAILNGR 325 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
60 vaccines or diagnostics. 
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Example 661 

A DNA sequence (GBSx0701) was identified in S.agalactiae <SEQ ID 2035> which encodes the amino 
acid sequence <SEQ ID 2036>. Analysis of this protein sequence reveals the following: 

Possible site: 30 
5 »> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 . 2392 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 



Query: 11 I IKNGLI IDPQSGFNQVSDMLIDQGKI KQI SKE IDI KGI PI IDASNKIVAPGLVDIHVHF 70 

I+KNG +IDP D+L++ GKIK+I K I + I IDA IV PG +DIHVH 

Sbjct: 5 IVKNGYVIDPSQNLEGEFDILVENGKIKKIDKNILVPEAEIIDAKGLIVCPGFIDIHVHL 64 

Query: 71 REPGQTHKENIHTGALSAAVGGFTT\rL>TIAKTlvrPTI3SPEIVKQVKESAAKEAI-KIETV 129 

R+PGQT+KE+I +G+ A GGFTT4+ M NTNP I + +V + + + + ++ 
Sbjct: 65 RDPGQTYKEDIESGSRCAVAGGFTTIVCMPNTNPPIDNTTVVNYILQKSKSVGLCRVLPT 124 

Query: 130 ATITKSLNGKI)LTOFEELLEAGVAGFSDDGIPLTDTKVLQEA>MLARKHDVvIjSLHEEDP 189 

TITK GK++ +F L EAG F+DDG P+ D+ V+++A+ LA + V + H ED 
Sbjct: 125 GTITKGRKGKEIADFYSLKEAGCVAFTDDGSPVMDSSVMRKALELASQLGVPIMDHCEDD 184 

Query: 190 SIM-GVLGIHEHIAQKIYHVCGASGLAEYSMIARDAMIAYQTQAIWHIQHLSSSESVEW 248 

L GV INE + + + AE IARD ++A +T VHIQH+S+ S+E++ 

Sbjct: 185 KIAYGV--IIffiGEVSAIJ^LSSRaPEAEEIQIARDGILAQRTGGHVHIQHVSTKLSLEII 242 

Query: 249 DFAQKLGANLTAEVTPQHFSKTENLLLTKGANAKLNPPLRLEKDRQALIDGLKSGVISII 303 

+ p + + G +T EV P H TE +L GANA++NPPLR ++DR ALI+G+K G+I 
Sbjct: 243 EFFKEKGVKITCEVNPNHLLFTEREVIiNSGANARWPPLRKKEDRLAIjIEGVKRGIIDCF 302 

Query: 3 09 ASDHAPHHIMEKAADNISQAPSGMTGLETSLALGITYLVSTKEIjSMIDFLAKMTCNPAQL 368 

A+DHAPH EK + + A G+ GL+T+L + L +S+ + T KPA++ 

Sbjct: 3 03 ATDHAPHQTFEK--ELVEFAMPGIIGLQTALPSRLE-LYRKGIISLKKLIEMFTINPARI 359 

Query: 369 YGFDAGYLREGGPADIVIFDQAEERIIKAEF-ASKSSNSPFIGDKLKGVIHYTICNGEIV 427 

G D G L+ G PADI IFD +E 1+ E SKS N+P G LKG + YTI +G+4V 
Sbjct: 360 IGVDLGTLKLGSPADITIFDPKKEWII^ETI^SKSRin'PLWGKVLKGKVIYTIKDGKIW 419 

Query: 428 YQ 429 

Sbjct: 420 YK 421 

A related DNA sequence was identified in S. pyogenes <SEQ ID 203 7> which encodes the amino acid 
sequence <SEQ ID 2038>. Analysis of this protein sequence reveals the following: 

Possible site: 35 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.80 Transmembrane 76 - 92 ( 76 - 92) 
INTEGRAL Likelihood = -0.00 Transmembrane 286 - 302 ( 286 - 302) 



• Final Results 

bacterial membrane --- Certainty= 0 . 132 (Affirmative) • 

bacterial outside Certainty= 0.000 (Not Clear) < : 

bacterial cytoplasm --- Certainty= 0.000 (Not Clear) < : 



The protein has homology with the following sequences in the databases: 
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!GB:AE000708 dihydroorotase [Aquifex aeolicus] 31S 3e-85 

>GP:AAC06948 GB:AE000708 dihydroorotase [Aquifex aeolicus] 
Score = 315 bits (801) , Expect = 3e-85 

Identities = 177/422 (41%) , Positives = 254/422 (59%) , Gaps = 



Query: 


2 


Sbjct: 


4 


Query: 


62 


Sbjct: 


64 


Query: 


121 


Sbjct: 


124 


Query: 


181 


Sbjct: 


184 




240 


Sbjct: 


242 


Query: 


300 


Sbjct: 


302 


Query: 


360 


Sbjct: 


359 




419 


Sbjct: 


419 



+++KNG V+DP + D+L++ +1 KI I EA++IDA GLIV PG +DIHVH 



R+PGQT+KEDI +G+ A AGG TT+V M NTNP I + 



++T+ GK++ DF +L EAG V+F+DDG P+ 



PQL-KGVLGFNEGIAEEHFHFCGATGVAEY SMX ARDVMI AYDRQAHVHIQHLSKAESVQV 239 

+L GV+ NEG AE IARD ++A HVHIQH+S S+++ 

DKLAYGVI - -NEGEVSALLGLSSRAPEAEEIQIARDGILAQRTGGHVHIQHVSTKLSLEI 241 



; K+T EV+P H TE +L +G +A++NPPIJR. + DRLA+IEG+K G+I 



ATDHAPH EK + + A G+ GL+T+L L L G ++L L+E T+NPA 



PAD+ IF +E ++ E SK+ N+P G LKG V YTI DG4+ 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 269/420 (64%) , Positives = 338/420 (80%) 



HFREPGQTHKE+IHTGAL+AA GG TTV+MMANTTIP IS E +++V SAAKE I I T 



A++T++ NGKD+ +F+ LIiEAG FSDDGIPL +KVL+EA +LA 





9 


Sbjct: 


1 




69 


Sbjct: 


61 




129 


Sbjct: 


121 


Query: 


189 


Sbjct: 


181 


Query: 


249 


Sbjct: 


241 


Query: 


309 


Sbjct: 


301 


Query: 


369 



P LNGVLG NE IA++ +H CGA+G+AEYSMIARD MIAY QA VHIQHLS +ESV+W 



FAQ+LGA +TAEV+PQHFS TE+LMi G +AK+NPPLR ++DR A+I+GLKSGVI++I 



A+DHAPHH EK D++++APSGMTGLET3L-LG+T+LV L+++ L KMT NPA L 



Query: 369 YGFDAGYLREGGPADIVIFDQAEERIIKAEFASKSSNSPFIGDKLKGVIHYTICMGEIVY 428 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 662 

A DNA sequence (GBSx0702) was identified in S.agalactiae <SEQ ID 2039> which encodes the amino 
acid sequence <SEQ ID 2040>. This protein is predicted to be orotate phosphoribosyltransferase PyrE 
(pyrE). Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2214 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Mot Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC95453 GB:AF068902 orotate phosphoribosyltransferase PyrE 
[Streptococcus pneumoniae] 
Identities = 152/208 (73%), Positives = 180/208 (86%) 

Query; 1 MDLARQIAMELLDIQAVYLRPQQPFTWASGVKSPIYTDNRVTLSYPETRTLIENGFVKQI 60 

M LA+ IA LL IQAVYL+P++PFTWASG+KSPIYTDNRVTL+YPETRTLIENGFV I 
Sbjct: 1 MTMKDIASHLLKIQAVYLKPEEPFTmSGIKSPIYTDHRvTLAYPETRTLIENGFVDAI 60 

Query: 61 QKHFPNVDIIAGTATAGIPHGAIIADKMNLPFAYIRSKAKDHGVGNQIEGRVYSGQKMVI 120 

++ FP V++IAGTATAGIPHGAI IADKMNLPFAYIRSK KDHG GNQIEGRV GQKMV+ 
Sbjct: 61 KEAFPEVEVIAGTATAGIPHGAIIADKMNLPFAYIRSKPKDHGAGNQIEGRVAQGQKMW 120 

Query: 121 IEDLISTGGSVLFAVTAAQSC^IF^GWAIFTYQl^KAEQAFREADIPLVTLTDYNQLI 180 

+EDLISTGGSVLEAV AA+ +G +VLGWAIF+YQL KA++ F +A + LVTL++Y++M 
Sbjct: 121 VEDLISTGGSVLFoAVAAAKREGAJ3VLGWAIFSYQLPI<ADKNFADAGVI<IiVTLSNYSELI 180 

Query: 181 KVAKVNGYITADQLVLLKKFKEDQMNWQ 208 

+A+ GYIT + L LLK+FKEDQ NWQ 
Sbjct: 181 HIAQEEGYITPEGLDLLKRFKEDQENWQ 208 

A related DNA sequence was identified in S. pyogenes <SEQ ID 204 1> which encodes the amino acid 
sequence <SEQ ID 2042>. Analysis of this protein sequence reveals the following: 

i K- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 16 12 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 158/208 (75%), Positives = 179/208 (85%) 

Query: 1 MDLARQIAMELLDIQAVYLRPQQPFTWASGVKSPIYTDNRVTLSYPETRTLIENGFVKQI 60 

M LA QIA +LLDI +AVYL+P+ PFTWASG+KSPIYTONRVTLSYP+TR LIENGFV+ I 
Sbjct: 1 MTLASQIATQLLDIKAVYLKPEDPFTWASGIKSPIYTDNRVTLSYPKTRDLIENGFVETI 60 

Query: 61 QKHFPNVDI IAGTATAGI PHGAIIADKMNLPFAYIRSKAKDHGVGNQIEGRVYSGQKMVI 120 

+ HF p V++IAGTATAGIPHGAIIADKM LPFAYIRSK KDHG GNQIEGRV GQKMVI 
Sbjct: 61 KAHFPEVEVIAGTATAGIPHGAIIADKMTLPFAYIRSKPKDHGAGNQIEGRVLKGQKMVI 120 
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Query: 121 IEDLISTGGSVLEAVTAAQSQGIEVLGWAIFTYQLA.KaEQAFREADIPLVTLTDYNQLI 180 

IEDLISTGGSVL+A AA +G +VLGWAIFTY+L KA Q P+EA I L+TL++Y +LI 
Sbjct: 121 IEDLISTGGSVLDAAAAASREGADVLGWAIFTYELPKASQNPKEAGIKLITLSNYTELI 180 

5 

Query: 181 KVAKVKGYITADQLVLLKKFKEDQMNWQ 208 

VAK+ GYIT D L LLKKFKEDQ+NWQ 
Sbjct: 181 AVAKLQGYITNDGLHLLKKFKEDQVNWQ 208 

10 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 663 

A DNA sequence (GBSxO703) was identified in S.agalactiae <SEQ ID 2043> which encodes the amino 
acid sequence <SEQ ID 2044>. This protein is predicted to be orotidine 5'-phosphate decarboxylase (pyrF). 
1 5 Analysis of this protein sequence reveals the following: 

Possible site: 40 

>» Seems to have an uncleavable N-term signal seq 

Final Results 

20 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9829> which encodes amino acid sequence <SEQ ID 9830> 
25 was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC95452 GB:AF068902 orotidine- 5 1 -decarboxylase PyrF 
[Streptococcus pneumoniae] 
Identities = 149/231 (64%), Positives = 176/231 (75%), Gaps = 1/231 (0%) 

30 

Query: 19 MLEKCPIIALDFSDLASVTTFLEHFPKEELLFVKIGMELYYSEGPSIIRYIKSLGHRIFIj 78 

M E PIIALDF +V FL FP EE L++K+GMELYY+ GP 1+ Y+K LGH +FL 
Sbjct: 1 MREHRPIIALDFPSFEAVKEFLALFPAEESLYLKVGMELYYAAGPEIVSYLKGLGHSVFL 60 

35 Query: 

Sbjct: 

Query: 139 QEQMQVDQHINLSVi/DSVCEIYAQKAQEAC-LDGWASAQEGMQIECKQTNEHFICLTPGIRP 198 
40 + QMQ Q+I S+ +SV HYA+K EAGLDGW SAQE IK+ TN FICLTPGIRP 

Sbjct: 121 EAQMQEFQNIQTSLQESVIHYAKKTAEAGLDGWCSAQEVQVIKQATNPDFICLTPGIRP 180 

Query: 199 PQTNQLDDQKRTMTPEQARIVGADYIWGRPITKAENPYQAYLEIKEEWNR 249 
+ DQKR MTP A +G+DYIWGRPIT+AE+P AY IK+EW + 
45 Sbjct: 181 AGV-AVGDQKRVMTPADAYQIGSDYIWGRPITQAEDPVAAYHAIKDEWTQ 230 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2045> which encodes the amino acid 
sequence <SEQ ID 2046>. Analysis of this protein sequence reveals the following: 

Possible site: 44 
50 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=o .1934 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

55 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 
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Identities = 149/229 (65%) , Positives = 180/229 (78%) , Gaps = 1/229 (0%) 

Query: 19 MLEKCPIIALDFSDLRSvTTFLEHFPKEEDLFVKIGMELYYSEGPSIIRYIKSLGHRIFI, 78 

M E+ PIIALDFS FL+ FP EE L+VKIGMELYY++GP I+RYIKSLGH +FL 

Sbjct: 1 MKEERPI IALDFSSFEETtQ^FLDLFPAEEKLYVKIGMELYYAQGPDIVRYI KSLGHNVFL 50 

Query: 79 DLKLHDIPOTVRSSMSVlAKLGIDMTNVHAAGGvEMMKaa^ 138 

DLKLHDI PNTVR+ +M+VL +L 1DM VH&AGGVEM+KAAREGLG+GP L+AVTQLTSTS 
Sbjct: 51 DLKljHDIP]SrrVRAAMAvLKELDIDMATVHAAGGvEMLKAAREGLGQGPTLIAVTQLTSTS 120 

Query: 139 QEQMQVDQHINDSWDSVCHYAQKAQEAGLDGVWVSAQEGMQIKKQmEHFICLTPGIRP 198 

++QM+ DQ+I S+++SV HY++ A +A LDG V SAQE IK T F CLTPGIRP 
Sbjct: 121 EDQMRGDQNIQTSLLESVLHYSKGAAKAQIjDGAVCSAQEVEAIKAVTPTGFTCLTPGIRP 180 

Query: 199 PQTNQUDDQKRTMTPEQARIVGADYIWGRPITKAEMPYQAYLEIKEEW 247 

+N + DQKR MTP QAR +G+DYXWGRPIT+A++P AY IK EW 
Sbjct: 181 KGSN-IGDQKRVMTPNQARRIGSDYIWGRPITQAKDPVAAYQAIKAEW 228 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



vaccines or 



Example 664 

A DNA sequence (GBSx0704) was identified in S.agalactiae <SEQ ID 2047> which encodes the a 
acid sequence <SEQ ID 2048> in others. Analysis of this protein sequence reveals the following: 



Possible site: 52 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



3 N-terminal signal sequence 



Likelihood = 
Likelihood = 
Likelihood = - 
Likelihood = - 
Likelihood = - 
Likelihood = - 
Likelihood = - 
Likelihood = - 
Likelihood = - 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



52 



• Final Results - 

bacterial n 
bacterial outside - 
bacterial cytoplasm - 



• 404 ( 378 - 

■ 309 ( 292 - 

• 181 ( 162 - 

• 283 ( 267 - 

- 130 ( 114 - 

■ 334 ( 318 - 

- 156 ( 140 - 



•- Certainty=C .4482 (Affirmative) ■ 
-- Certainty=0. 0000 (Not Clear) < : 
-- Certainty=0. 0000 (Not Clear) < i 



The protein has homology with tl 



following sequences in the GENPEPT database: 



Identities 




5 I 

^ 


Sbjct: 


8 1> 




65 \ 


Sbjct: 


68 b 




125 I 


Sbjct: 


128 C 


Query: 


185 C 
C 


Sbjct: 


188 C 



0 GB:AP001507 unknown conserved prots 
[Bacillus halodurans] 

= 63/243 (25%) , Positives = 120/243 (48%) 

MSWLRAGKLLI ESGAEVYRVEDTMKHFAKALQIENFEAYWSS S 1 1 ASGINRYGKQEAK 64 

AG++++ +GAE YRVE+T++ AKA Q N ++V ++ I S + 
MDICMLAGEIMLINGAETYRVEETLER^AKAGQFRNVKSFVTTTGIFLSFEEEGAGDVMQ 67 



G+FS G +L D+ A + 



GLGE+ +I+G LM +VPG 



+ +SI+ G+A+ I 



Query: 245 EII 247 
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Sbjct: 248 ALL 250 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 665 

A DNA sequence (GBSx0705) was identified in S.agalactiae <SEQ ID 2049> which encodes the amino 
acid sequence <SEQ ID 2050>. This protein is predicted to be ABC transporter. Analysis of this protein 
sequence reveals the following: 
Possible site: 40 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 5134 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9353> which encodes amino acid sequence <SEQ ID 9354> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB12571 GB:Z99108 similar to ABC transporter (ATP-binding 
protein) [Bacillus subtilis] 
Identities = 193/288 (67%), Positives = 231/288 (80%) 

Query: 1 MNDVINIVYHVENQDLWYSGDYTNFESWAMKXAQLEAAYERQQKEIADLQDFVNRNKA SO 

+N VIN++YHVENQ+L RY GDY F VY +KK QLEAAY+ +QQ+E+A+L+DFV RNKA 
Sbjct: 222 LNSVINLIYEWENQELTRWGDYHQFMEWEVKKQQLEAAYKKQQQEVAELKDFVARNKA 281 

Query: 61 RVATRNMAMSRQKKLDKMDIIELQAEKPKPSFEFKESRTPGRFIFQAKDLQIGYDRALTK 120 

RV+TRNMAMSRQKKLDKMD+IEL AEKPKP F FK +RT G+ IF+ KDL IGYD L++ 
Sbjct: 282 RVSTRMW1SRQKKLDKMDMIELAAEKPKPEFHFKPARTSGKLIFETKDLVIGYDSPLSR 341 

Query: 121 PLNLTFERNQKIAIVGANGIGKTTLLKSLLGIIPPISGNVERGDFIDLGYFEQEVPGGNR 180 

PLNL ER QKIA+ GANGIGKTTLLKSLLG I P+ G+VERG+ I GYFEQEV N 
Sbjct: 342 PLNLRMERGQKIALYGANGIGKTTLLKSLLGEIQPL3GSVERGEHIYTGYFEQEVKETNN 401 

Query: 181 QTPLEAVWDAFPAMQAEVEAaiARCGLTSKHIESQIQ^/LSGGEQSK^FCLLMNRENNV 240 

T +E VW FP+ Q E+RAA A+CGLT+KHIES++ VLSGGE++KVR C L+N E N+ 
Sbjct: 402 NTCIEEVWSEFPSYTQYEIRAAPAKCGLTTKHIESRVSVLSGGEKAKVRLCKLINSETNL 461 

Query: 241 LVLDEPTNHLDVDAKDELKRALKAYKGSILMVCHEPDFYEGWMDDVWD 288 

LVLDE PTNHLD DAK+ELKRALK YKGSIL++ HEPDFY + W+ 

Sbjct: 462 LVLDEPTNHLDADAKEELKRALKEYKGS Z LL J SHEPDFYMDIATETWN 509 
Identities = 56/219 (25%) , Positives = 97/219 (43%) , Gaps = 44/219 (20%) 

Query: 104 IFQAKDLQIGY-DRALTKPLNLTFERNQKIAIVGANGIGKTTLLKSLLGIIPPISGNVER 162 

I KDL G+ DRA+ ++ + + + ++GANG GK+T + + G + P G VE 
Sbjct: 3 ILSVKDLSHGFGDRAIF1WVSFRLLKGEHVGLIGANGEGKSTFMNIITGKLEPDEGKVEW 62 

Query: 163 GDFIDLGYFEQEVPGGITOQTPLFAVl'JDAFPALNCAE 198 

+ +GY +Q ++ + + DAF L E 

Sbjct: 63 SKNVRVGYLDQHTVLEKGKSIRDVLKDAFHYLFAMEEEMNEIYNKMGEADPDELEKLLEE 122 

Query: 199 VRAALAR CGLTSKHIESQIQVLSGGEQSKVRFCLLMNRENW 239 

++ AL GL+ +E + LSGG+++KV L+ + 

Sbjct: 123 VGVIQDALTNNDFYVIDSKVEEIARGLGLSDIGLERDVTDLSGGQRTKVLLAKLLLEKPE 182 
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Query: 240 VLVLDEPTNHLDVDAKDELICRALKAYKGSIIMVCHEEDF 278 

+L+LDEPTN+LD + LKR L+ Y+ + +++ H+ F 
Sbjct: 183 ILLLDEPTNYLDEQHIEWLKRYLQEYENAFILISHDIPF 221 



A related DNA sequence was identified in S. pyogenes <SEQ ID 205 1> which encodes the amino acid 
sequence <SEQ ID 2052>. Analysis of this protein sequence reveals the following: 



Possible site: 14 

■»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 2794 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Wot Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 246/294 (83%) , Positives = 274/294 (92%) , Gaps = 1/294 (0%) 

Query: 1 MNDVINIVYHVENQDLTOYSGDYTNFESWAMKKAQLFJVAYERQQKEIADLQDFVNRNKA 60 

+NDVINIVYHVENQ LVRY+GDY F++VY MK++QLEAAYERQQKEIA+LQDFVNRNKA 
Sbjct: 233 LNDVINIWHVENQSLWYTGDYYQFQAVYEMKQSQLEAAYERQQKEIANLQDFVNRNKA 292 

Query: 61 RVATRNMAMSRQKKLDKMDIIELQAEKPKPSFEFKESRTPGRFIFQAKDLQIGYDRALTK 120 

RVATRNMAMSRQKKLDKMDI IELQAEKPKP+FEFK++RTP RFIFQ K+L IGYD LTK 
Sbjct: 293 RVATRNMAMSRQKKLDKMDIIELQAEKPKPNFEFKQARTPSRF1FQTKNLVIGYDYPLTK 352 

Query: 121 - PLNLTFERNQK1AIVGANGIGKTTLLKSLLGI IPPISGNVERGDFIDLGYFEQEVPGGN 179 

PLN+TFERNQK1AIVGANGIGK+TLLKSLLG+I P+ G++ GDF+ + +GYFEQEV G N 
Sbjct: 353 EPLNITFERNQK1AIVGMGIGKSTLLKSLLGVIEPLEGHIVTGDFLEVGYFEQEVTGVN 412 

Query: 180 RQ^PLFJWWDAFPALNQAEVRAALARCGLTSKHIESQIQVLSGGEQSKVRFCLLMNRENN 239 

RQTPLE VTOAFPALNQAEVlUiAIjARCGLTSKHIESQIQVLSGGEQ+KVRFCLLMNRENN 
Sbjct: 413 ROTPLEVVWDAFPAIJNQAEVRAAIARCGLTSKHIESQIQVLSGGEQAKVRFCLL^ENN 472 

Query: 24 0 VLVLDEPTNHIiDVDAKDELKRALKAYKGS I LMVCHE PDFYEGWMDDVWDFNQLS 293 

VL+LDEPTNHLD4DAK+ELKRALKAYKGSILMVCHEPDFY GW+ D WDF++L+ 
Sbjct: 473 VlILDEPl^LDIDAKWELKRALKAYKGSILMVCHEPDFYNGWVTDTWDFSKLT 526 
Identities = 60/218 (27%) , Positives = 102/218 (46%) , Gaps = 43/218 (19%) 

Query: 104 IFQAKDLQIGY-DRALTKPLNLTFERNQKIAIVGANGIGKTTLLKSLLGIIPPISGNVER 162 

I + K L G+ DRA+ + ++ + + I +VGANG GK+T + + G + P G VE 
Sbjct: 15 ILEVKQLSHGFGDRAIFENVSFRLLKGEHIGLVGANGEGKSTFMSrVTGHLQPDEGKVEW 74 

Query: 163 GDFIDLGYFEQEVPGGNRQTPLEAVTOAFPAISfQAEVR AALA 204 

++ GY +Q + QT + + AF L + E R A++A 

Sbjct: 75 SKYVTAGYLDQHTVLESGQTVRDVljRTAFDELFKTENRIlSrEIYASMADDKADIAVLMEEV 134 

Query: 205 -- - RCGLTSKHIESQ1QVLSGGEQSKVRFCLLMNRENNV 240 

G+ +ES + LSGG+++KV B+ + ++ 
Sbjct: 13 5 GELQDRLESRDFYTLDAKIDEVARALGVMDFGMESDVTSLSGGQRTKVLLAKLLLEKPDI 194 

Query: 241 LVLDEPTNHLDVDAKDELKRALKaYKGSILMVCHEPDF 278 

L+LDEPTNHLD + + LKR L+ Y+ + +++ H+ F 
Sbjct: 195 LLLDEPTNHLDAEHIEWLKRYLQHYENAFVLISHDISF 232 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 666 

A DNA sequence (GBSx0706) was identified in S.agalactiae <SEQ ID 2053> which encodes the amino 
acid sequence <SEQ ID 2054>. This protein is predicted to be lipoprotein Nlpl precursor (pstS). Analysis 
of this protein sequence reveals the following: 

5 Possible site: 32 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2637 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certair.ty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB14429 GB:Z99116 alternate gene name: yamB-similar to 
15 phosphate ABC transporter (binding protein) [Bacillus subtilis] 

Identities = 42/62 (67%) , Positives = 49/62 (78%) 

Query: IS SITSVGSTALQPLVEaAADEFGKTNLGKTINVQGGGEGTGLSQVQSGAVQIGNSDLFAEE 74 
S+T GS+A+QPLV AAA.++F + N I VQ GGSGTGLSQV GAVQIGNSD+FAEE 
20 Sbjct: 45 SLTISGSSAMQPLVLAAAEKFMEENPDADIQVQAGGSGTGLSQVSEGAVQIGNSDVFAEE 104 

Query: 75 KE 76 
KE 

Sbjct: 105 KE 106 

25 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1695> which encodes the amino acid 
sequence <SEQ ID 1696>. Analysis of this protein sequence reveals the following: 

Possible site: 24 

»> May be a lipoprotein 

30 

Final Results 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

35 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 63/74 (85%) , Positives = 71/74 (95%) 

Query: 3 LSGCANWIDKGQSITSVGSTALQPLVFAAADEFGKTNLGKTINVQGGGSGTGLSQVQSGA 62 
40 LS C++WIDKG+SIT+VGSTALQPLVEA ADEFG +NLGKT+NVQGGGSGTGLSQVQSGA 

Sbjct: 20 LSACSSWIDKGESITAVGSTALQPLVEAVADEFGSS^LGKTVNVQGGGSGTGLSQVQSGA 79 

Query: 63 VQIGNSDLFAEEKE 76 
VQIGNSD+FAEEK+ 
45 Sbjct: 80 VQIGNSDVFAEEKD 93 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 667 

50 A DNA sequence (GBSx0707) was identified in S.agalactiae <SEQ ID 2055> which encodes the amino 
acid sequence <SEQ ID 2056>. This protein is predicted to be lipoprotein Nlpl precursor (pstS). Analysis 
of this protein sequence reveals the following: 

Possible site: 60 

>» Seems to have an uncleavable N-term signal seq 

55 
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Final Results 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9343> which encodes amino acid sequence <SEQ ID 9344> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB14429 GB : a99116 alternate gene name: yzmB-similar to 

phosphate ABC transporter (binding protein) [Bacillus subtilis] 
Identities = 95/184 (51%) , Positives = 126/184 (67%) , Gaps = 1/184 (0%) 

Query: 3 DHQmVAGLAVIWKKVNVKNLTTHQLRDIFAGKIKNt'JKEVGGQDLDISIINRAASSGSR 62 

DHQVAV G+A VN VK++4- +L+ IF GKIKNWKE+GG+D I+++NR SSG+R 
Sbjct: 115 DHQVAWGMAAAVNPDAGVKDI SKDELKKI FTGKI KM'JKELGGKDQKITLVNRPDSSGTR 174 

Query: 63 ATFDNTIMGNVAPIQSQEQDSNGMVKSIVSQTPGAISYLAFAYV-DKSVGTLKLNGFAPT 121 

ATF + P + +DS+ VK I++ TPGAI YLAF+Y+ D V L ++G P 
Sbjct: 175 ATFVKYALDGAEPAEGITEDSSNTVKKIIADTPGAIGYLAFSYLTDDKVTALSIDGVKPE 234 

Query: 122 AKNVTTDNWKLWSYEHMYTKGlffiTGLTKEFLDYMKSDK^QSSIVQHMGYISINDMKVVKD 181 

AKNV T + +W+Y+H YTKG TGL KEFLDY+KS+ +Q SIV GYI + DMKV +D 
Sbjct: 235 AKNVATGEYPIWAYQHSYTKGEATGLAJCEFLDYLKSEDIQKSIVTDQGYIPVTDMKVTRD 294 

Query: 182 AEGK 185 
A GK 

Sbjct: 295 ANGK 298 

There is also homology to SEQ ID 1696. 

SEQ ID 9344 (GBS659) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 135 (lane 2 & 3; MW 60kDa). It was also expressed in E.coli as a His-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 135 (lane 5-7; MW 35kDa) and in 
Figure 178 (lane 11; MW 35kDa). 

GBS659-His was purified as shown in Figure 228, lane 6-8. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 668 

A DNA sequence (GBSx0708) was identified in S.agalactiae <SEQ ID 2057> which encodes the amino 
acid sequence <SEQ ID 205 8>. This protein is predicted to be phosphate transporter permease PstC (pstC- 
2). Analysis of this protein sequence reveals the following: 



Possible site: 47 

»> Seems to have no N- terminal 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 



- 51 ( 27 - 

- 183 ( 154 - 

- 298 ( 277 - 

- 101 ( 81 - 

- 149 ( 131 - 



- Final Results 

bacterial membrane Certainty=0. 7198 (Affirmative) 

bacterial outside — Certainty=0. 0000 (Not Clear) < 
bacterial cytoplasm Certainty=0 .0000 (Not Clear) < 
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A related GBS nucleic acid sequence <SEQ ID 8635> which encodes amino acid sequence <SEQ ID 8636> 
was also identified. Analysis of this protein sequence reveals the following: 



Lipop: Possible site: -1 Crend: 8 
SRCFLG: 0 

McG: Length of UR: 5 

Peak Value of UR: -0.12 
Net Charge of CR: 2 
McG: Discrim Score: -16.22 
GvH: Signal Score (-7.5): -4.26 

Possible site: 41 
>» Seems to have no N-terminal signal sequence 
Amino Acid Composition: calculated from 1 
ALOM program count: 5 value: -15.50 threshold: 
Likelihood = 
Likelihood = 
Likelihood = -6.3 
INTEGRAL Likelihood = -E 
INTEGRAL Likelihood = -3 
PERIPHERAL Likelihood = C 
modified ALOM score: 3.60 
icml HYPID: 7 CFP: 0.720 

*** Reasoning Step: 3 



29 - 45 ( 21 - 55! 



Transmembrane 161 - 



Transmembrane 



177 ( 148 - 

292 ( 271 - 

95 < 75 - 

143 ( 125 - 



- Final Results 

bacterial membrane - 

bacterial outside - 

bacterial cytoplasm - 



•- Certainty=0. 7198 (Affirmative) ■ 
- Certainty=0. 0000 (Not Clear) < : 
•- Certainty=0. 0000 (Not Clear) < ! 



30 The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB14428 GB:Z99116 alternate gene name: yzmC-similar to 

phosphate ABC transporter (permease) [Bacillus subtilis] 
Identities = 145/303 (47%), Positives = 209/303 (68%), Gaps = 4/303 (1%) 

35 Query: 8 KNQELAKKLTSPSKNSRLEKFGKGITFLSLALIVFIVAM-ILIFVAQKGLSTFFVDGVKL 66 

+N ++++L S +N +L++ + + ALI+ ++ I IF+ KGL +F V+GV 
Sbjct: 6 ENMSVSERLISSRQNRQLDEVRGRMIVTACALIMIAASVAITIFLGVKGLQSFLVNGVSP 65 

Query: 67 TDFLFNTKWEP- - SAKSFGAFPMIAGSFIVTILSAI IATPFAIGAAVFMTEISPKYGSKI 124 
40 +FL + W P S +G P I GSF VTILSA+IA P I +FMTEI+P +G K+ 

Sbjct: 66 IEFLTSLNWNPTDSDPKYGVLPFIFGSFAVTILSALIAAPLGIAGPIFMTEIAPNWGKKV 125 

Query: 125 LQPAVELLVGIPSWYGFIGLQIIVPFVESI-FGGTGFGILSGVCVLFVMILPTVTFMTV 183 
LQP +ELLVGIPSWYGFIGL ++VPF+ GTG +L+G VL VMILPT+T ++ 

45 Sbjct: 126 LQPVIELLVGIPSWYGFIGLTVLVPFIAQFKSSGTGHSLLAGTIVLSVMILPTITSISA 185 

Query: 184 DSLRAVPRHYKEASLAMGATRWQT IWRVI LNAARPGI FTAI VFGMARAFGEALAIQMWG 243 

D++ ++P+ +E S A+GATRWQTI +V++ AA P + TA+V GMARAFGEALA+QMV+G 
Sbjct: 186 DAMASLPKSLREGSYALGATRWQTIRKVLVPAAFPTLMTAVVLGMARAFGEALAVQMVIG 245 

50 

Query: 244 NSAILPTSLTTPAATLTSVLTMGIGNTVIvlGTVQNNVLWSLALVLLIMSLAFNTVIKLITR 303 

N+ +LP S A TLT+++T+ +G+T G+V+NN LWS+ LVLL+MS F +1+ ++ 
Sbjct: 246 NTRVLPESPFDTAGTLTTIITLNMGHTTYGSVENNTLWSMGLVLLVMSFLFILLIRYLSS 305 

55 Query: 304 EGK 306 

K 

Sbjct: 306 RRK 308 

A related DNA sequence was identified in S.pyogenes <SEQ ID 1691> which encodes t 
60 sequence <SEQ ID 1692>. Analysis of this protein sequence reveals the following: 

Possible site: 41 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-17.25 Transmembrane 29 - 45 ( 21 - 55) 
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INTEGRAL Likelihood = -7.22 Transmembrane 162 - 178 ( 154 - 184) 

INTEGRAL Likelihood = -5.57 Transmembrane 282 - 298 ( 277 - 302) 

INTEGRAL Likelihood = -5.41 Transmembrane 96 -.112 ( 81 - 116) 

INTEGRAL Likelihood = -3.08 Transmembrane 133 - 149 ( 131 - 152) 



Final Results 

bacterial membrane Certainty=0. 7899 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 266/311 (85%) , Positives = 290/311 (92%) , Gaps = 6/311 (1%) 

Query: 7 MKNQELAKKLTSPSKNSRLEKFGKGITFLSLALIVFIVAMILIFVAQKGLSTFFVDGVKL 66 

M+NQELAKKL SPSKNSRLE FG+ ITFL LALIVFIVAMILIFVAQKGLSTFFVD V L 
Sbjct: 1 MENQELAKKLASPSKNSRLETFGRTITFLCll^IVFIVAMILIFVAQKGLSTFFVDKVNL 60 

Query: 67 TDFLFNTKWEPSAKS FGAFPMIAGSFIVTILSAI IATPFAIGAAVFMTEISPKY 120 

DFLF +W+PS K+ GA PMI GSF+VTILSAI IATPFAIGAAVFMTEISPKY 

Sbjct: 61 FDFLFGKEWQPSVKNAAGIPYLGALPMITGSFLVTILSAI IATPFAIGAAVFMTEISPKY 120 

Query: 121 GSKILQPAVELLVGIPSVVYGFIGLQIIVPFVRSIFGGTGFGILSGVCVLFVMILPTVTF 1B0 

G+K+LQPAVELLVGIPSWYGFIGLQ+IVPF+RSIFGGTGFGILSGVCVLFVMILPTvTF 
Sbjct: 121 GAKLLQPAVELLVGIPSVVYGFIGLQVIVPFMRSIFGGTGFGILSGVCVLFVI4ILPTVTF 180 

Query: 181 MTVDSLRAVPRHYKEASLAMGATRWQTIWVILNAARPGIFTAIVFGMARAFGEALAIQM 240 

MT DSLRAVPRHY+EAS+AMGATRWQTIWRV+LNAARPGIFTA++FGMARAFGEALAIQM 
Sbjct: 181 MTTDSLRAVPRHYREASMAMGATRWQT IWRWLNAARPGI FTAVI FGMARAFGEALAI QM 240 

Query: 241 WGNSAILPTSLTTPAATLTSVLTMGIGNTVMGTVQNNVLWSI^ 300 

WOTSA++P+SLTTPAATLTSVLTMGIGNTVMGTVQNN\^WSLALVLL+MSLAFN+++KL 
Sbjct: 241 WGNSAVMPSSLTTPAATLTSVMMGIGNTVMGTVQNN^ 300 

Query: 301 ITREGKKNYER 311 

IT+E K+NYER 
Sbjct: 301 ITKERKRNYER 311 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 669 

A DNA sequence (GBSx0709) was identified in S.agalactiae <SEQ ID 2059> which encodes the amino 
acid sequence <SEQ ID 206O. Analysis of this protein sequence reveals the following: 

Possible site: 13 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2469 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 670 

A DNA sequence (GBSx0710) was identified in S.agalactiae <SEQ ID 2061> which encodes the amino 
acid sequence <SEQ ID 2062>. This protein is predicted to be probable abc transporter permease protein in 
soda-comga intergenic reg. Analysis of this protein sequence reveals the following: 

Possible site: 18 



■ > Seems to 
INTEGRAL 

INTEGRAL 

INTEGRAL 



lave a cleavable N-terro signal seq. 
Likelihood = 
Likelihood = 



Likelihood = 
Likelihood = 
Likelihood = 



87 



• Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



Transmembrane 66 ■ 

Transmembrane 260 - 

Transmembrane 109 - 

Transmembrane 181 - 



■ Certainty=0. 4694 (Affirmative) • 
- Certainty=0. 0000 (Not Clear) < t 
• Certainty=o. 0000 (Not Clear) < t 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB14427 GB:Z99116 alternate gene name: yzmD-similar to 

phosphate ABC transporter (permease) [Bacillus subtilis] 
Identities = 157/294 (53%), Positives = 225/294 (76%) 

^INAKKADKr J AT^ILYSIAAIIVTIr J ASLLIFILVRGLPHVSWSFLTGKSSSVEAGGGIGI 60 
MN K DKLAT + AMI IL L G+ +S+ F+T KSS+ AGGGI 

MNRKITDKLATGMFGLCAAI IAAILVGLFSYIIINGVSQLSFQFITTKSSAIAAGGGIRD 6 0 



QL+NSF++L +T++I+IPL +G G++++EYA ++T+F+RTC1E+LSSLPS+V+G+FG 



Sbjct: 


1 




61 


Sbjct: 


61 


Query: 


121 


Sbjct: 


121 


Query: 


181 


Sb j ct : 


181 




241 


Sb j ct : 


241 



+G++II GALALTVFNLP M E 



+EA LALG+SRW TV 



il+TG +LASGR+FGEAAAL++TAG + P L+++ WN S TSP++IFR AE 



TLAVHIW VN++G IPDA 



+L+FNL+AR +G 



A related DNA sequence was identified in S. pyogenes <SEQ ID 1685> which encodes the i 
sequence <SEQ ID 1686>. Analysis of this protein sequence reveals the following: 



Possible site: 56 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-11, 

INTEGRAL Likelihood =-10 

INTEGRAL Likelihood = -5 

INTEGRAL Likelihood = -5. 

INTEGRAL Likelihood = -2 



17 - 33 ( 8 - 40) 

260 - 276 ( 257 - 285) 

66 - 82 ( 57 - 87) 

109 - 125 ( 105 - 129) 

181 - 197 ( 180 - 197) 



■ Final Results 

bacterial membrane --- Certainty=0. 5755 (Affirmative) < succ: 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 257/294 (87%) , Positives = 278/294 (94%) 
Query: 1 MNAKKADKLATTILYSIAAIIVTILASLLIFILVRGLPHVSWSFLTGKSSSYEAGGGIGI 60 
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Sbjct: 


1 


Query: 


61 


Sbjct: 


61 


Query: 


121 


Sbjct: 


121 




181 


Sbjct: 


181 


Query: 




Sbjct: 


241 



QLYNSFFLLIVTLIISIPLS GAGIYL+EYAKKG +TNF+RTCIEILSSLPSWVGLFGY 



LIFWQF+YGFSIISGALALTVFNJjPQMTR+VEDSL +VHHTQREAGLALG+SRWETV Y 



W+PEALP +VTG+VLASGRIFGEARAL1YTAGQSAPALDWSNV1N LSVTSPISIFRQ+E 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 671 

A DNA sequence (GBSx0711) was identified in S.agalactiae <SEQ ID 2063> which encodes the amino 
acid sequence <SEQ ID 2064>. This protein is predicted to be phosphate ABC transporter, ATP-binding 
protein (pstB) (pstB-2). Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 



Final Results 

bacterial cytoplasm — Certainty=0. 4506 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB99016 GB:U67544 phosphate specific transport complex 
component (pstB) [Methanococcus jannaschii] 
Identities = 154/247 (62%) , Positives = 204/247 (82%) 

Query: 21 LTTKDLHVYYGEKEAIKGIDMQFEKNKITALIGESGCGKBTYLRSLNRMNDTIDIARVTG 80 



Query: 81 QIMYEGIDVl^QDINVYEMRKHIGIWFQRPNPFAKSIYKKITFAYERAGVKDKKFLDEW 140 

+++ +G ++ +D++VYE+RK +GMVFQ+PNPFA SIY N+ F G+KDKK LD++V 

Sbjct: 66 EVLLDGKNIYDKDVDVYELRKRVGMVFQKPNPFAMSIYDNVAFGPRIHGIKDKKELDKIV 125 

Query: 141 ETSLKQAALWDQVKDDLHKSAFTLSGGQQQRLCIARAIAVKPEILLMDEPASALDPIATM 200 

E +LK+AALWD+VKD+LHK+A +LSGGQQQRLCIARAIAVKPE+LLMDEP SALDPI+T+ 
Sbjct: 126 EWALKKAALWDEVKDELHKNALSLSGGQQQRLCIARAIAVKPEVLLMDEPTSALDPISTL 185 

Query: 201 QLEETMFELKKNYTIIIVTHNMQQAARASDYTAFFYLGDLIEYDKTNNIFQNAKCQSTSD 260 

++EE M EL K+YTI++VTHNMQQA+R SDYTAFF +G LIE+ +T IF N + + T D 
Sbjct: 186 KIEELMVELAKDYTIVVVTHNMQQASRVSDYTAFFLMGKLIEFGETEQIFLNPQKKETDD 245 

Query: 261 YVSGRFG 267 

Y+SGRFG 
Sbjct: 246 YISGRFG 252 



A related DNA sequence was identified in S.pyogenes <SEQ ID 1681> which encodes the amino acid 
60 sequence <SEQ ID 1682>. Analysis of this protein sequence reveals the following: 
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3 N- terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0. 2796 (Affirmative) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

- Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 242/267 (90%) , Positives = 258/267 (95%) 

Query: 1 MAEYNWDERHIlTFPEENSALTTKDLHVYyGEKEAIKGIDMQFEKNKITALIGPSGCGKS 60 

M EYNW+ERHIITFPEE AL TKDLHVYYG KEAIKGIDMQFEK+KITALIGPSGCGKS 
Sbjct: 1 OTEYNWNERHIITFPEETLAIATKDLHVYYGAKEAIKGIDMQFEKHKITALIGPSGCGKS 60 

Query: 61 TYLRSI J NRMNDTID1ARVTGQIMYEGIDVNAQDINVYE^^RKHIG^WFQRPNPFAKSIYKN 120 

TYLRSLNRMNDTID1ARVTG+I+Y+GIDVN +D+NVYE+RKH+GMVFQRPNPFAKSIYKN 
Sbjct: 61 TYLRSLNRMNDTID1ARVTGEILYQ/3IDVNRKDMNVYEIRKHLGMVFQRPNPFAKSIYKN 120 

Query: 121 ITFAYERAGVKDKICFLDEWETSLKQAALWDQVKDDLHKSAFTLSGGQQQRLCIARAIAV 180 

ITFA+ERAGVKDKK LDE+VETSLKQAALWDQVKDDLHKSAFTLSGGQQQRLCIARAI+V 
Sbjct: 121 ITFAHERAGVKDKKVLDEIVETSLKQAALWDQVKDDLHKSAFTLSGGQQQRLCIARAISV 180 

Query: 181 KPEILLMDEPASALDPIATMQLEETMFEIjKKNYTIIIVTHNMQQAARASDYTAFFYLGDL 240 

KP+ILLMDEPASALDPIATMQLEETMFELKKNYTIIIVTHNMQQAARASDYTAFFYLG+L 
Sbjct: 181 KPDILLMDEPASALDPIATMQLEETMFELKKNYTIIIVTHNMQQAARASDYTAFFYLGNL 240 

Query: 241 IEYDKTNNI FQNAKCQSTSDYVSGRFG 267 

IEYDKT NI FQNA+ CQST+DYVSG FG 
Sbjct: 241 IEYDKTRNIFQNAQCQSTNDYVSGHFG 267 

Based on this analysis, it was predicted tiiat these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 672 

A DNA sequence (GBSx0712) was identified in S.agalactiae <SEQ ID 2065> which encodes the amino 
acid sequence <SEQ ID 2066>. This protein is predicted to be phosphate ABC transporter, ATP-binding 
protein (pstB-1). Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3806 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9815> which encodes amino acid sequence <SEQ ID 9816> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB14426 GB:Z99116 alternate gene name: yzmE~3imilar to 
phosphate ABC transporter (ATP-binding protein) 
[Bacillus subtilis] 
Identities = 148/248 (59%) , Positives = 189/248 (75%) 

Query: 5 ILQVSDLSVYYNECKXALKEVSMDFYPNEITALIGPSGSGKSTLLRAINRMGDIiNPEVTLT 64 

+L+V DLS+YY K+A+ V+MD N +TALIGPSG GKST LR INRM DL P 
Sbjct: 22 VLEVKDLSIYYGNKQATOHVNMDIEKMVTALIGPSGCGKSTFLRNINRMM3LIPSARAE 81 

Query: 65 GAVMYNGHNVYSPRTDTVELRKEIC^IVFQQPNPFPMSVFENVVYGLRLKGIKDKATLDEA 124 
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Query: 125 VETSLKGASIWDEVKDRLHDSALGLSGGQQQRVCIAP.TLATKPKIILLDEPTSALDPISA 184 

VE Sh A++WDEVKDRLH SAL LSGGQQQR+CIARTLA KP ++LLDEP SALDPIS 
Sbjct: 142 -VEESLTKAALWDEVKDRLHSSALSLSGGQQQRLCIARTLAMKPAVLLLDEPASALDPISN 201 

Query: 185 GKIEETLHGLKDQYTMLLVTRSMQQASRISDRTGFFLDGKLIEYGNTKEMFMNPKHKETE 244 

KIEE + GLK +Y++++VT +MQQA R+SDRT FFL+G L+EYG T+++F +PK ++TE 
Sbjct: 202 AKIEELITGLKREYSIIIVTHNMQQALRVSDRTAFFLNGELVEYGQTEQIFTSPKKQKTE 261 

Query: 245 DYITGSCFG 252 

DYI GKFG 
Sbjct: 262 DYINGKFG 2 69 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2067> which encodes the amino acid 
sequence <SEQ ID 2068>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3590 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 208/252 (82%) , Positives = 235/252 (92%) 

Query: 1 MTQPILQVSDLSVYYNKKKALKEVSMDFYPNEITALIGPSGSGKSTLLRAINRMGDLNPE 60 

MT+PILQ+ DLSVYYN+KK LK+VS+D YPNEITALIGPSGSGKSTLLR+INRM DLNPE 
Sbjct: 2 MTEPILQIRDliSVYYNQKKTLKDVSLDLYPNEITArjlGPSGSGKSTLLRSINRMNDLNPE 61 

Query: 61 VTLTGAVl»™GHmrfSPRTDTVELRKEIGMVFQQPNPFPMSVFENVVYGLRLKGIKDKAT 120 

VT+TG+++YNGHN+YSPRTDTV+LRKEIGMVFQQPNPFPMS++ENVVYGLRLKGI+DK+ 
Sbjct: 62 VTITGSIVYNGHNIYSPRTDTVDLRKEIGMVFQQPNPFPMSIYENWYGLRLKGIRDKSI 121 



Sbjct: 122 LDHAVESSLKGASIMNEVKDRLHDSAVGLSGGQQQRVCIARVLATSPRIILLDEPTSALD 181 

Query: 181 PISAGKIEETIBGLKDQYTMLLOTRSMQQASRISDRTGFFLDGKLIEYGMTKEMFMNPPCH 240 

PISAGKIEETL LK YT+ +VTRSMQQASR+SDRTGFFL+G+L+E G TK MFMNPK 
Sbjct: 182 PlSAGKIEETIiLLKKDYTIAIVTRSMQQASRLSDRTGFFLEGDLLECGPTKAMFMMPJCR 241 

Query: 241 KETEDYITGKFG 252 

KETEDYI +GKFG 
Sbjct: 242 KETEDYISGKFG 253 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 673 

A DNA sequence (GBSx0713) was identified in S.agalactiae <SEQ ID 2069> which encodes the amino 
acid sequence <SEQ ID 2070>. Analysis of this protein sequence reveals the following: 

i N-terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0. 1937 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certain^y=0 . 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAD22042 GB:AF118229 PhoU [Streptococcus pneumoniae] 
Identities = 75/216 (34%), Positives = 126/216 (57%), Gaps « 1/216 (0%) 







LESKFDEELDKLHNQFYiWGIEaiGQIKKTi/RS.EVSHDRELAKEVIEDDVTLNK 


'FETKLE 


61 






+R++FD EL +L F +G + K + A S D+E+A+ +1 D +N 


++ +E 




Sbjct: 


1 


MRNQFDLELHELEQSFLGLGQLVL3TASKALIALASKBKEMAELIINKDHAINC 


GQSAIE 


60 


Query: 


62 


tOCSLEIIALQQPVSQDLRTVITVLKATSDVERMGDHAAAVAKATIRMKGEERIE 


AVELEI 


121 






++ALQQP DLR VI+++ + SD+ERMGDH A +AKA +++K E ++ 


E ++ 




Sbjct: 


61 


LTCARLLALQQPQVSDLRFVISIMSSCSDLERMGDHMAGIAKAVLQLK-ENQLAPDEEQL 


119 


Query: 


122 


IMGKAVKM4LEEALTAYINGDDEKAYEVAAMDEIVDDYFRDIQKMVVETIQKHPDVAFA 


181 






+ MGK +ML + L A+ KA +A DE +D Y+ + K ++ ++ 






Sbjct: 


12 0 


HQMGKLSLSMLADLLVAFPLHQASKAISIAQKDEQIDQYYYALSKEIIGLMKDQETSIPN 179 


Query: 


182 


AKE YFQVLMHLERIGD YGKNI CEWIVYLKTGKI I EL 217 










+Y ++ HLER DY NICE +VYL+TG++++L ' 






Sbjct: 




GTQYLYI IGHLERFADYIANICERLVYLETGELVDL 215 







A related DNA sequence was identified in S.pyogenes <SEQ ID 1677> which encodes the amino acid 
sequence <SEQ ID 1678>. Analysis of this protein sequence reveals the following: 



Final Results 

bacterial cytoplasm Certainty=0. 2229 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 174/217 (80%) , Positives = 194/217 (89%) 

Query: 1 MLRSKFDEELDKLHNQFYAMGIEAIGQIKKTVRAFVSHDREIAKEVIEDDVTLNNFETKL 60 

MLR+KF+EELDKLHNQFY+MG+E + QI KTVRAFVSHDRELAKEVIE+D T+NNFETKL 
Sbjct: 1 MLRTKFEEELDKLHNQFYSMGMEVIAQINKTvPJU?VSHDRELAKEVIEEDDTINNFETKL 60 

Query: 61 EKKSLEIIALQQPVSQDLRTVITVLKATSDVERMGDHAAAVAKATIRMKGEERIPAVELE 120 

EKKSLEI IALQQPVS DLR VITVLKA+SD+ERMGDHAA++AKATTRMKGEERIP VE + 
Sbjct: 61 EKKSLEI IALQQPVSNDLRMVITVLKASSDIERMGDHAASIAKATIRMKGEERI PWEEQ 120 



Query: 181 AAKEYFQVLMHLERIGDYGKNI CEWIVYLKTGKI IEL 217 

A KEYFQVLM+LERIGDY +NICEWIVYLKTGKIIEL 
Sbjct: 181 AGKEYFQVLMYLERIGDYARNI CEWIVYLKTGKI IEL 217 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 674 

A DNA sequence (GBSx0714) was identified in S.agalactiae <SEQ ID 2071> which encodes the amino 
acid sequence <SEQ ID 2072>. This protein is predicted to be aminopeptidase N. Analysis of this protein 
sequence reveals the following: 
Possible site: 30 

>» Seems to have no N-terminal signal sequence 
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Final ResultG 

bacterial cytoplasm Certainty=0 . 2845 (Affirmative) < succ; 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 



TvEHFVTKPVPENYNLFLDINRQTKTFSGNVAVSGEALDNNISFHQKGLTIKSVLLDNQP 62 
+V F+ F+PENYNLFLDINR KTF+GNVA+H-GEA+DN+IS HQK LTI SVLLDN+ 
SVARFIESFIPENYNLFLDINRSEKTFTGNVAITGEAIDNH1SLHQKDLTINSVLLDNES 63 



Query: 


3 


Sbjct: 


4 


Query: 


63 


Sbjct: 


64 


Query: 


123 


Sbjct: 


124 


Query: 


183 


Sb j ct : 


184 


Query: 


243 


Sbjct: 


244 


Query: 


303 


Sbjct: 


304 


Query: 


362 


Sbjct: 


364 


Query: 


422 


Sbjct: 


424 


Query: 


482 


Sb j ct : 


484 




542 


Sbjct: 


542 


Query: 


602 


Sbjct: 


602 


Query: 


662 


Sbjct: 


662 


Query: 


722 


Sbjct: 


722 


Query: 


782 


Sbjct: 


781 



L FP +DEPEAKATFDLSLKFD +EG+ ALSKMPEIN+ R+ETG+WTF4TTP+MS 



+YLLAF G LHGKT TKNGT VG +AT A N +DF+LDI WVIEFYEDYF V+YP 



V'I'MKWWDDLWLNESFANMMEYVS++ IEP NIFE F G+P AL+RDATDGVQSVH+ 



AIVYAKGSRLMHMLRRWLGD FA GLK YFEKHQY NT+GRDLWN 



ALS+ SGKDV++FMD+WLEQPGYPV++A4-+ +D LIL+QKQFFIGEHEDK RLW+IPLN+ 



NW G+P+ L+EE + 1PN4SQLA +N NG LR NT NTAHYIT+YQGQLL++I+ D 



+D +SKLQI+QER LLAESG ISY+SL+ L+ L+ +E A 



+F RLGF+ +EG3 D+ EMVR +LS 



F+AH+ NI +IPA+IR LVL N+MK S L + Y T D NF+RQL ALS+ 



f L +L K+K++VKPQDL + WY FL SF QE+VW+WA+ENWEWIKA LGGDM 



SFD FV P+ FK +ERL+QY FFEPQ SD A+ RNI MGIK I+ARV LI K+K V 



Query: 842 INTIKKY 848 
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+ +K Y 
Sbjct: 841 ESALKDY 847 



A related DNA sequence was identified in S.pyogenes <SEQ ID 2073> which encodes the amino acid 
sequence <SEQ ID 2074>. Analysis of this protein sequence reveals the following: 



Final Results 

bacterial cytoplasm Certainty=0 . 1098 (Affirmative) < succ: 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown helow: 

Identities = 576/848 (67%) , Positives = 692/848 (80%) , Gaps = 3/848 (0%) 



Query: 


1 


Sbjct: 


21 


Query: 


61 


Sbjct: 


81 


Query: 


121 


Sbjct: 


141 


Query: 


181 


Sbjct: 


201 


Query: 


241 


Sbjct: 


261 


Query: 


301 




321 




361 


Sbjct; 


381 




421 


Sb j ct : 


441 




481 


Sbjct: 


501 


Query: 


541 


Sbjct: 


559 


Query: 


601 


Sb j ct : 


619 


Query: 


661 


Sbjct: 


679 




721 



^ +Q+D DNE + ++L ETG M LV EFSG ITDNMTG+YPSYYT NG KKEVISTQF 



ESHFARE FP IDEP+AKATFDLSL FDQ+ GEIALSNMPE+N ++R+ETGLWTFDTT + 



MSSYLLAFALGELHGKT +K GT VG YAT AH L+ LDFSLDI VRVI FYEDYFGV 



YPIPQSL++ALPDFS+GAMENWGL+TYRE+YLLVDENS+V SRQQVALV+AHEIAHQWFG 



NLVTMKW11DDLWLNESFAKMMEYVSH- IEP I EDFQTGG+ PLALKRDATDGVQSVH 



VEVNHPDEINTLFDPAIVYAKGSRLMHKLRR-r+GD DFA GL YFEK+QY+NT+GRDLW 



I LS TSGKDVAAFMD+WLEQPGYPV+ A++E D+LIL+QKQFFIG+ E+K RLW IPLN 



+NW G+PE LTE +VIPNFSQIA +N+ GALRFN +NTAHYIT+YQG LL+ ++++L 



)J S LQ++QER LLA+SG+ISY+ L+ L+4- L SY+V A++ V+ GL F+ E 



S E F V + +FN+ GFEK+ 



IF+A++NNIA+IPAA+R LVL N+MK+FE-i- L + ETY TTD N + L AST 
3 IFEAYQNNIASIPAAWRLVtANQMKYFETDSLVDIYFETYVATTDMNLRSDLTVAFSQT 738 

Query: 721 TDSKTLKCLLSDWKNKDIVKPQDLAI^SVIYATFLKI^SFTQESVWEWAQENWEWIKATLGGD 780 
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T++++L K+KDI+KPQDL+ WY L SFTQ+ +WEWA+ENW+WIK+ LGGD 
Sbjct: 739 KQPTTIRRILVSLKDKDIIKPQDLSY-WYHALLGQSFTQDIIWEWARENWDWIKSALGGD 797 

Query: 781 MSFDKFVIYPSSSFKTEERLEQYKNFFEPQLSDMAISRNISMGIKEISARVLLITKQKEE 840 

MSFDKFVIYP+S+FKT + L +YK+FFEP+L DMMSRNI+MGI EI ARV LITK+KE 
Sbjct: 798 MSFDKFVIYPASNFKTPKHLAEYKSFFEPKjDDI^ISRNITMGIHEIEARVALITKEKEA 857 

Query: 841 VINTIKKY 848 

VI + Y 
Sbjct: 858 VIAALSHY 865 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 675 

A DNA sequence (GBSx0715) was identified in S.agalactiae <SEQ ID 2075> which encodes the amino 
acid sequence <SEQ ID 2076>. This protein is predicted to be response regulator (trcR). Analysis of this 
protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 .2741 (Affirmative) < suco 
bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 
bacterial outside — - Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

3 pneumoniae] 

Query: 1 MIKILLIEDDLSLSNSVFDFLDDFADVMQIFDGEEGLYEAESGVYDLILLDLMLPEKNGF 60 

MIKILL+EDDL IiSNSVFDFLDDFADVMQ+FDGEEGLYEAESGVYDLILLDLMLPEKNGF 
Sbjct: 1 MIKILLVEDDLGLSNSVFDFLDDFADVMQVFDGEEGLYEAESGVYDLILLDLMLPEKNGF 60 

Query: 61 QVLKELREKGITTPVLIMTAKESIDDKBCjGFDLGSDDyLTKPFYLEELKMRIQALLKRSG 120 

QVLKELREKGITTPVLIMTAKES+DDKG GF+LGADDYLTKPFYLEELJ^MRIQALLKRSG 
Sbjct: 61 QVLKELREKGITTPVLIMTAKESLDDKC-HGFELGADDYLTKPFYLEELKMRIQALLKRSG 120 

^ KFN+N+L YG+I V++STN+ V T VELLGKEFDLLWFLQNQNVILPK+QIFDR+WG 

Sbjct: 121 KFNENTLTYGNIVVNLSTNTVKVSDVPVELLGKEFDLLVYFLQNQNVILPKTQIFDRLWG 180 

Query: 181 FDSDTTISWEVYVSKVRKKLKGTLFSENLQTLRSVGYILKHVE 224 

FDSDTTI SWEVYVS KVRKKLKGT F+ENLQTLRSVGY+DK V+ 
Sbjct: 181 FDSDTTISWEVYVSKVRKKLKGTTFAENLQTLRSVGYLLKDVQ 224 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2077> which encodes the amino acid 
sequence <SEQ ID 2078>. Analysis of this protein sequence reveals the following: 

Possible site: 59 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2689 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 180/224 (80%), Positives = 200/224 (88%) 
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Query: 1 MIKILLIEDDLSLSNSVFDFLDDFADVMQIFDGEEGLYEAESQVYDLILLDLMLPEKNQF 60 

MIKILL+EDDLSLSNS+FDFLDDFADVMQ+FDG+EGLYEAESG+YDLILLDLMLPEKNGF 
Sbjct: 1 MIKILLVEDDLSLSNSIFDFLDDFADVMQVFDGDEGLYEAESGIYDBILLDLMLPEKNGF 60 

Query: 61 QVLKELREKGITTPVLIMTAKESIDDKGQGFDLGADDYLTKPFYLEELKMRIQALLKRSG 120 

QVLKELREK I PVLIMTAKE +DDKG GF+LGADDYLTKPFYLEELKMRIQALLKR+G 
Sbjct: 61 QVLKELREKDIKIPVLIMTAKEGLDDKGHGFELGACOYLTKPFYLEELKMRIQALIiKRTG 120 

Query: 121 KFNDNSLIYGDIRVDMSTNSTFVNQTEVELLGKEFDLLVYFLQNQNVILPKSQIFDRIWG 180 

KF D ++ +G+4 VD++ V VELLGKEFDLLVY LQNQNVILPK+QIFDR+WG 

Sbjct: 121 KFADKHISFGNLWDLARKEVKVEGKl'VELLGKEFDLLVYLLQNQNVILPKTQIFDRLWG 180 

Query: 181 FDSDTTI SWEVYVSKVRKKLKGTLFSENLQTLRSVGYILKHVE 224 

FDSDTTI SWEVY+SK+RKKLKGT F LQTLRSVGYILK+ E 
Sbjct: 181 FDSDTTISWFVYISKIRKKLKGTCFVNRLQTLRSVGYILKNNE 224 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful 
vaccines or diagnostics. 



Example 676 

A DNA sequence (GBSx0716) was identified in S.agalactiae <SEQ ID 2079> which encodes the amino 
acid sequence <SEQ ID 208O. This protein is predicted to be histidine kinase. Analysis of this protein 
sequence reveals the following: 



Possible site: 34 

>» Seems to have no N-terminal signal sequ 
INTEGRAL Likelihood = -9.18 

INTEGRAL Likelihood = -4.94 Transmembrane 182 - 198 ( 178 

Final Results 

bacterial membrane — Certainty=0. 4673 (Affirmative; 

bacterial outside — Certainty=0. 0000 (Not Clear) ■ 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) • 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA54466 GB:X77249 histidine kinase [Streptococcus pneumoniae] 
Identities = 218/420 (51%), Positives = 305/420 ,(71%), Gaps = 4/420 (0%) 

Sbjct: 

Query: 75 -LDNSNIASVKLKPGGQTVANTDI ILFTSEEEVINYFDAFSNYQFLKPNKKNLGGI SELT 133 

L+N+ + K++ +NT++ILF + + + F +K KK LG I ++ 

Sbjct: 75 DLENARADASKVEIKPNVSSNTEVILFDKDFTQLLSGNRFLGLDKIKLEKKELGHIYQIQ 134 

134 LTNI FGQDETYHAVTVKVN-NPAYPNVTYMTAIVNI DQLVNAKERYEKI 1 1 FVMTTFWI I 192 
+ N +GQ+E Y + ++ N + N+ Y ++N QL A +++E++I+ VM +FWI + 

135 VFNSYGQEEIYRVILMETNISSVSTNIKYAAVLINTSQLEQASQKHEQLIWVMASFWIL 194 

193 SIGASIYLAKWAQKPIIENYERQKAFVENASHELRTPIAVLQNRLETLFRKPNATILENS 252 

S+ AS+YLA+ + +P++E+ ++Q++FVENASHELRTPIAVLQNRLETLFRKP ATI++ S 
195 SLLASLYLARVSVRPLLESMQKQQSF\'ENASHELRTPIA\rLQNRLETLFRKPEATIMDVS 254 

253 ENIASSLDEVRNMRILTTNLLNIARRDDGIKPELAVIKPTLFDSIFENYDLITQENGKNF 312 

E+IASSL+EVRNMR LTT+LLNLARRDDGIKPELA + + F++ F NY++1 EN + F 
255 ESIASSLEEVFJSMRFLTTSLLNLARRDDGIKPEIAEVPTSFFNTTFTNYEMIASENNRVF 314 

Query: 313 TGHNMIQDSFKTDKTLLKQLMTILFDMAIKYTDNDGSIDFTISETDKYLFLEIADNGPGI 372 

N I + TD+ LLKQLMTILFDNA+KYT+ DG IDF IS TD+ L+L ++DNG GI 
Sbjct: 315 RFENRIHRTIVTDQLLLKQLMTILFDNAVKYTEEDGEIDFLISATDRNLYLLVSDNGIGI 374 

Query: 373 SEEDKVRIFDRFYRVDKARTRQQGGFGLGLSLAQQIVNSLRGNITVIDNKPRGSIFKIKL 432 
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S EDK +IFDRFYRVDKARTRQ+GGFGLGLSLA+QIV++L+G +TV DNKP+G+IF++K+ 
Sbjct: 375 STEDKKKI FDRFYRVDKMTRQKGGFGIXSLSLftKQI VDALKGTVTVKDNKPKGTI FEVKI 434 

A related DNA sequence was identified in S.pyogenes <SEQ ID 208 1> which encodes the amino acid 
sequence <SEQ ID 2082>. Analysis of this protein sequence reveals the following: 

Possible site: 57 
>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-11.09 Transmembrane 19 - 35 ( 14 - 44) 
INTEGRAL Likelihood =-10.24 Transmembrane 185 - 201 ( 1B2 - 2 06) 



Final Results 

bacterial membrane Certainty=0 . 5437 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:CAA544S6 GB:X77249 histidine kinase [Streptococcus pneumoniae] 
Identities = 223/436 (51%), Positives = 313/436 (71%), Gaps = 5/436 (1%) 

Query: 2 NKLKKEILSDNYNHFFHFFAVFTGIFVIMTIIILQIMRFGVYSSVDSSLVSVSNNASSYA 61 

+KLKK +D++++F F VFT IF MT+IILQ+M +Y+SVD L +S N + 
Sbjct: 3 SKLKKTWYADDFSYFIRNFGVFTLIFSTMTLIILQVMHSSLYTSVDDKLHGLSENPQAVI 62 



Query: 122 NFIILDiajRLGSIETTSLMFYGQEEKYHTITVGVHIX^PA-VAYMMAVVNVEQLDRANE 180 

L+K+ LG I + N YGQEE Y I + +1 + + Y ++N QL++A++ 
Sbjct: 119 KIKLEKKELGHIYQIQVFNSYGQEEIYRVILMETNISSVSTNIKYAAVLINTSQLEQASQ 178 

Query: 181 RYERIIIIVMSVFWLISILASIYLAKWSRKPILESYEKQKMFVENASHELRTPLAvLQNR 240 

++E++I++VM+ FW++S+LAS+YLA+ S +P+LES +KQ+ FVENASHELRTPLAVLQNR 
Sbjct: 179 KHEQLIVVVMASFWILSLLASLYLARVSVRPLLESMQKQQS FVENASHELRTPLAVLQNR 238 

Query: 241 LESLFRKPNETILENSEHIASSLDEVRNMRILTTNLLNLARRDDGINPQWTHLDTDFFNA 300 

LE+LFRKP TI++ SE +ASSL+EVRNMR LTT+LLNLARRDDGI P+ + T FFN 
Sbjct: 23 9 LETLFRKPEAT IMDVSESIAS SLEEVRNMRFLTTSLLNLARRDDG I KPELAEVPTSFFNT 298 

Query: 301 I FENYELVAKEYGKI FYFQNQVNRSLRMDKALLKQLIT I LFDNAI KYTDKNGI IE 1 1 VKT 360 

F NYE4+A E ++F F+N+4-+R++ D+ LIjKQL+TILFDNA+KYT+++G 1+ ++ 
Sbjct: 299 TFTNYEMIASENNRVFRFENRIHRTIVTDQLLLKQLMTILFDNAVKYTEEDGEIDFLISA 358 

Query: 361 TDKNLLISVIDNGPGITDEEKKKIFDRFYRVDKARTRQTGGFGLGLALAQQIVMSLKGNI 420 

TD+NL + V DNG GI+ E+KKKI FDRFYRVDKARTRQ GGFGLGL+LA+QIV +LKG + 
Sbjct: 359 TDRNLYLLVSDNGIGISTEDIQCKIFDRFYRVDKARTRQKGGFGLGLSLAKQIVDALKGTV 418 

Query: 421 TVKDNDPKGS I FEVKL 436 

TVKDN PKG+IFEVK+ 
Sbjct: 419 TVKDNKPKGTIFEVKI 434 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 255/436 (60%), Positives = 334/436 (75%), Gaps = 10/436 (2%) 

Query: 7 ISKFKKNV-SDS--HFIHFFTVFSGIFLVMTVIILQVMRYGVYSSVDSSLKYISTHPKNY 63 

++K KK + SD+ HF HFF VF+GIF++MT+IILQ+MR+GVYSSVDSSL +S + +Y 
Sbjct: 1 MNKLKKEILSDNYNHFFHFFAVFTGIFVIMTIIILQIMRFGVYSSVDSSLVSVSNNASSY 60 



Sbjct: 61 ANRTMARISSFYFDTENNIIKALPDSDSSKLLGTPAANTDIILFSANGTILNAFDAFSNY 120 



Query: 117 QFLKPNKKNLGGISELTLTNIFGQDETYHAVT\7KVNNPAYPNVTYMTAIWIDQLVNA^ 176 
Q +K+ LG I +L N +GQ+E YH +TV V+ YP V YM A+VN++QL A E 
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Sbjct: 121 QNFHLDKI^LGSIETTSL^FYGQEEKXHTIWGVHIKI^PAVAYMMAVVNVEQLDRMIE 180 

Query: 177 RYEKI II FVMTTFWI IS IGASIYIAKWAQKPIIENYERQKAFVENASHELRTPLA.VLQNR 236 

RYE+III VM+ FW+ISI ASIYIAKW++KPI+E+YE+QK FVENASHELRTPIAVLQNR 
Sbjct: 181 RYERIIIIVMSVFWLISILASIYLA™3RKPILESYEKQK>;F/ENASHELRTPLAVLQNR 240 

Query: 237 LETLFRKPNATILENSENIASSLDEVRNMRILTTtttliNIiARRDDGIKPELAVIKPTLFDS 296 

LE+LFRKPN TILENSE+ +ASS LDE VRNMRI LTTNLLNIARRDDG I P+ + F+4 
Sbjct: 241 LESLFRKPNETILENSEHIA8SLDE\T»IRILTraLIA"LARRDDGINPQWTHLDTDFFNA 300 

Query: 297 IFENYDLITQENGKNFTGHNMIQDSFKTDKTLLKQLMTILFDNAIKYTDNDGSIDFTISE 356 

IFENY+L+ +E GK F N + S + DK LLKQL+TILFDNAIKYTD +G 1+ + 
Sbjct: 301 I FENYELVAKEYGKI FYFQNQVNRSLRMDKMjLKQJjITILFDNAI KYTDKNGI IE I I VKT 360 

Query: 357 TDKYLFLEIADNGPGISEEDKVRIFDRFYRVDKARTRQQGGFGLGLSLAQQIVNSLRGNI 416 

TDK L 4 4- DNGPGI++E+K + 1 FDRF YRVDKARTRQ GC-FGLGL+LAQQIV SL+GNI 
Sbjct: 361 TDKNLLISVIDNGPGITDEEKKKIFDRFYRVDKARTRQTGGFGLGLALAQQIVMSLKGNI 420 

Query: 417 TVIDNKPRGS I FKI Kh 432 

TV DN P+GSIF++KL 
Sbjct: 421 TVKDNDPKGSIFEVKL 436 

SEQ ID 2080 (GBS339d) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total 
cell extract is shown in Figure 146 (lane 9; MW 73kDa). It was also expressed in E.coli as a His-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 185 (lane 5; MW 73kDa). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 677 

A DNA sequence (GBSx0717) was identified in S.agalactiae <SEQ ID 2083> which encodes the amino 
acid sequence <SEQ ID 2084>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 1783 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . D0DD (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9813> which encodes amino acid sequence <SEQ ID 9814> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 



Query: 1 MEIEKTNRMNALFEFYAALLTDKGM1WIELYYADDYSLAEIAEESGVSRQAVYDNIKRTE 60 

MEIEKTNRMNALFEFYAALLTDKQMNYIELYYADDYSLAEIAEE VSRQAVYDNI KRTE 
Sbjct: 1 MEIEKTNRMNALFEFYAALLTDKQMNYIELYYADDYSLAEIAEEFDVSRQAVYDNIKRTE 60 

Query: 61 KILEAYEMKLHMYSDYIVRSQIFDDILEKYTDDftFLQEKISILSSIDNRD 110 

KILE YEMKLHMYSDY+VRS+IFD I++KY +D +LQ KISIL++IDNRD 
Sbjct: 61 KILEDYEMKLHNTYSDYVVRSEIFDAIMKKYPNDPYLQNiaSILTTIDNRD 110 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2085> which encodes the amino acid 
sequence <SEQ ID 2086>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0 .1767 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 95/110 (86%), Positives = 103/110 ' (93%) 

Query: 1 MEIEKTNRM^ALFEFYA^LTDKQMNYIELYYADDYSLTaEIAEESGVSROAVYDNIKRTE 60 

MEIEKTNRMN7ALFEFYAALLTDKQMNYIELYYADDYSLAEIA+E GVSRQAVYDNIKRTE 
Sbjct: 4 MEIEKTNRMNALFEFYAALLTDKQMmflELYYADDYSI^IADEFGVSRQAvYDNrKRTE 63 

Query: 61 KILEAYEMKLHMYSDYIWSQIFDDILEKYTDDAFLQEKISILSSIDNRD 110 

KILE YEMKLHMYSDY+VRS+IFDD++ Y D +LQEKISIL+SIDNR+ 
Sbjct: 64 KILETYEMKLHMYSDYWRSEIFDDMIAHYPHDEYLQEKISILTSIDNRE 113 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 678 

A DNA sequence (GBSx0719) was identified in S.agalactiae <SEQ ID 2087> which encodes the amino 
acid sequence <SEQ ID 2088>. This protein is predicted to be signal recognition particle protein (ffh). 
Analysis of this protein sequence reveals the following: 
Possible site: 51 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.22 Transmembrane 37 - 53 ( 37 - 53) 

Final Results 

bacterial membrane Certainty=0 . 1086 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB48050 GB:U88582 Ffh [Streptococcus mutans] 
Identities = 437/522 (83%) , Positives = 484/522 (92%) , Gaps = 7/522 (1%) 

I^FESLTERLQGVFKNIRGKKKLSEKDVQSVTi^IRLALLEADVALPVWTFIKHVRERA 6 0 
MAFESLTERLQGVFKN+RGK+KLSEKDVQEVTKEIRLALLEADVALPWK FIK VR+RA 



LANKL+KE+NARP+MIAADIYRPAAIDQLK LG QINVPVFDMGT HSAVEIV++GL QA 



+ENRNDYVLIDTAGRLQID LM EL D+KA+A PNEILLWDSMIGQEAANVA EFN+Q 



L ++GV+LTKIDGDTRGGAALSVR+ITGKPIKFTGTGEKITDIETFHPDRM+SRILGMGD 



LLTLIE+ASQ+YDE++S ELAEKMREK+FDFNDFI+QLDQVQNMG MED+LKM+PGMANN 





1 


Sbjct: 


1 


Query: 




Sbjct: 


61 




121 


Sbjct: 


121 




181 


Sbjct: 


181 


Query: 


241 


Sbjct: 


241 


Query: 


301 


Sbjct: 


301 
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Query: 361 PAMKNPKVDEHEIMKiaiVSSOTPSERENPDLmPSRRRRIAMSGOTFVDVNKFIKDF. 420 

PA+ N +VDE EIARKRAIVSSMTPEERENPDLL PSRRRRIA+GSGNTFV+VNKFIKDF 
Sbjct: 361 PALAMVEVDEGEIARKRAIVSSMTPEERENPDLLTPSRRRRIASGSSOTFVNVNKFIKDF 420 

Query: 421 NQAKQMMQGVMSGDMNKMMKKMGIDPNNLPKDMPGMDC-MDMSNLEGMMGQNGMPDLSSL- 479 

NQAK+MMQGVMSGDMNK+MK+MGI+PNN+P + MD S LEGMMGQ GMPD+S L 

Sbjct: 421 NQAKKMMQGVMSGDMNKVMKQMGINPNNMP NNMDSSALEGMMGQGGMPDMSGLS 474 

Query: 480 GGDMDFSQMFGGGLKGKVGAFAAKQSMKRMANKMKKAKKKRK 521 

G +MD SQMFGGGLKGKVG FA KQSMK+MA +MKKAKK++K 
Sbjct: 475 GANMDVSQMFGGGLKGKVGEFAMKQSMKKMAKRMKKAKKRKK 516 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2089> which encodes the amino acid 
sequence <SEQ ID 209O. Analysis of this protein sequence reveals the following: 

Possible site: 53 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.22 Transmembrane 39 - 55 ( 39 - 55) 

Final Results 

bacterial membrane Certainty=0 . 1086 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 458/522 (87%) , Positives = 489/522 (92%) , Gaps = 4/522 (0%) 

Query: 1 MAFESLTERLQGVFKNIRGKKKLSEKDVQEVTKEIRLALLFADVALPWKTFIKHVRERA 60 

MAFESLT+RLQ VFK+IRGKKK1SE DVQEVTKEIRLALLEADVALPWKTFIK VRERA 
Sbjct: 3 MAFESLTQRLQDVFKHIRGKKKLSESDVQEVTKEIRLALLEADVALPWKTFIKRVRERA 62 

Query: 61 VGHEIIDTLDPTQQIVKIVNEELTDLLGAETSEIEKSPKIPTIIMMVGLQGAGKTTFAGK 120 

+GHEIIDTLDPTQQI+KIVNEELT +LG+ET+EI+KSPKIPTIIMMVGLQGAGKTTFAGK 
Sbjct: 63 IGHEI IDTLDPTQQILKIVNEELTSILGSETAEIDKSPKIPTI IMMVGLQGAGKTTFAGK 122 

Query: 121 IJWKLIKEDNARPMMIftADIYRPAAIDQLKTLGSQINVPVFDMGTNHSAVEIVTKGLEQA 180 

LANKLIKE+NARP+MIAAD1YRPAAIDQLKTLG QINVPVFDMGT+HSAV+IV KGLEQA 
Sbjct: 123 LANKLIKEENARPLMIAADIYRPAAIDQLKTLGQQINVPVFDMGTDHSAVDIVRKGLEQA 182 

Query:' 181 RENRNDYVLIDTAGRLQIDATLMQELHDVKAIAQPNEILLVVDSMIGQEAANVAEEFNRQ 240 

REN NDYVLI DTAGRLQI D LM EL DVKA+AQPNEILLWDSMIGQEAANVA EFN Q 
Sbjct: 183 RENHNDYVLIDTAGRLQIDEKLMGELRDVKALAQPNEILLVVDSMIGQEAfiNVAYEFNHQ 242 

Query: 241 LSISGWLTKIDGDTRGGAALSVREITGKPIKFTGTGEKITDIETFHPDRMASRILGMGD 300 

LSI+GWLTKIDGDTRGGAALSVREITGKPIKFTG GEKITDIETFHPDRM+SRILGMGD 
Sbjct: 243 LSITGWLTKIDGDTRGGAALSVREITGKPIKFTGIGEKITDIETFHPDRMSSRILGMGD 302 

Query: 301 LLTLIERASQEYDEKRSMEIAEKJ1RENTFDFOT3FIDQLDQVQNMGPMEDLLKMLPGMANN 360 

LLTLIE+ASQEYDEK+S+EIAEKMRENTFDFNDFI +QLDQVQNMGPMEDLLKM+PGMA N 
Sbjct: 303 LLTLIEKASQEYDEKKSLELASKMRENTFDFNDFIEQLDQVQNKGPMEDLLKMIPGMAGN 362 



Query: 361 I 

PA+ N KVDEN+IARKRAIVSSMT? ERENPDLLNPSRRRRIAAGSGN+FVD NKFIKDF 
Sbjct: 363 PAIANIKVDENQIARKRAIVSSMTPAERENPDLLNPSRRRRIAAGSGNSFVD-NKFIKDF 421 

Query: 421 NQAKQMQGVMSGDMNKMMKKMGIDPNNLPKDMPGMDGM-DMSNLEGMMGQNGMPDLSSL 479 

NQAK MMQGVMSGDM+KMMK MGI+PNNLPK+MP GM DMS+LEGMMGQ GMPDLS L 
Sbjct: 422 NQAKSMMQGVMSGDMSKMMKDMGINPNNLPKNMPA--GMPDMSSLEGMMGQGGMPDLSGL 479 



Query: ■ 

GGDMD SQ+FG G KGK+G FA KQ+MKR ANK+KKAKKKRK 
Sbjct: 480 GGDMDMSQLFGKGFKGKIGQFAMKQAMKRQANKLKKAKKKRK 521 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 679 

A DNA sequence (GBSx0721) was identified in S.agalactiae <SEQ ID 2091> which encodes the amino 
acid sequence <SEQ ID 2092>. This protein is predicted to be SatD. Analysis of this protein sequence 
reveals the following: 

5 Possible site: 49 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -1.28 Transmembrane 3 - 19 ( 2-19) 



Final Results 

10 bacterial membrane Certainty=0. 1510 (Affirmative) < succ? 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9811> which encodes amino acid sequence <SEQ ID 9812> 
15 was also identified. 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAG28336 GB:U88582 SatD [Streptococcus mutans] 
Identities = 106/222 (47%), Positives = 162/222 (72%), Gaps = 2/222 (0%) 

20 Query; 13 MYLALIGDIINSKQILERETFQQ3FQQLMTELSDVYGEELISPFTITAGDEFQALLKPSK 72 

+Y+A+IGD+I+SK I R Q+ + L+ +++ Y E L S FTIT GDEFQALL P+ 
Sbjct: 2 IYIAIIGDLISSKAITNRPKSQKQLKNLLNQINKKYKELLKSAFTITTGDEFQALLVPNP 61 



Query: 73 KVFQIIDHIQLALKPVNVRFGLGTGNIITSINSNES1GADGPAYWHARSAINHIHDKNDY 132 

++FQIID I L KP +RFG+G+G+I+T IN +SIG+DGPAYWHAR+AI++IHDKNDY 
Sbjct: 62 QIFQIIDEIALGFKPYQIRFGVGSGSILTEINPEQSIGSDGPAYWHARAAIDYIHDKNDY 121 

Query: 133 GTVQVAICLDDEDQNLELTMSLISAGDFIKSKWTTNHFQMLEHLILQDNYQEQFQHQKL 192 

G+ +A+ L+D + + + +N++++A +FIKSKWT +++++ L+ Y+E+F H+K+ 
Sbjct: 122 GSNHLAVDLEDTETSQQ- - INAILAACEFIKSKWTVTQYEVIDGLLQAGIYEEKFSHKKM 179 

Query: 193 AQLENIEPSALTKRLKASGLKIYLRTRTQAADLLVKSCTQTK 234 

A+ ++ PS+ KRLK+SGLKIYLR + A LL+ + + K 
Sbjct: 180 AEKLDLSPSSFNKRLKSSGLKIYLRNKKVATTLLLNAIRKEK 221 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2093> which encodes the amino acid 
sequence <SEQ ID 2094>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3744 (Affirmative) < suco 

bacterial membrane Certair.ty=0 . 0000 (Not Clear) < suco 

bacterial outside Certair.ty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 94/213 (44%) , Positives = 137/213 (64%) , Gaps = 3/213 (1%) 

Query: 14 YLALIGDIINSKQILERETFQQSFQQLMTELSDVYGEELISPFTITAGDEFQALLKPSKK 73 

Y+ALIGDII SKQ+ +R Q++ + +L+ + +IS ++T GDEFQ L + 
Sbjct: 3 YIALIGDI IQSKQLTDRSKVQKTLAAYLDDLNKTFAPYI I SKLSLTLGDEFQGLFQVDTP 62 

Query: 74 WQIIDHIQLALKPVNVRFGLGTGNIITSINSNESIGADGPAYWHARSAINHIHDKNDYG 133 

+F +ID I + + +RFG+G G+I+T IN + SIGADGPAYWHAR AI +IH KNDYG 
Sbjct: 63 IFHLIDLINHHMD-IPIRFGVGVGSILTDINPDISIGADGPAYWHAREAIRYIHQKNDYG 121 

Query: 134 TVQVAICLDDEDQNLELTLNSLISAGDFIKSKWTTNHFQMLEHLILQDNYQEQFQHQKLA 193 

+A L N + LNSL++AGD IK+ W + +++ + L4- Y+E F Q+L 

Sbjct: 122 NTTLA- -LRTGHHNQDDVIjNSLLAAGDAIKANSRASQWEIFDTLLDLGIYEEYFDQQRLG 179 
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Query: 194 QLENIEPSALTKRLKASGLKIYLRTRTQA&DLL 226 

+ ++ SAL+KRLK+S +KIYLRTR A + L 
Sbjct: 180 KQLSLSSSALSKRLKSSHVKIYLRTRQSALNCL 212 

A related GBS gene <SEQ ID 8637> and protein <SEQ ID 8638> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 7 
McG: Discrim Score: 4.96 
10 GvH: Signal Score (-7.5): -5.46 

Possible site: 49 
»> Seems to have an uncleavable N-terra signal seq 
ALOM program count: 1 value: -1.28 threshold: 0.0 

INTEGRAL Likelihood = -1.28 Transmembrane 3 - 19 ( 1-19) 
15 PERIPHERAL Likelihood =5.99 74 

modified ALOM score: 0.76 

*** Reasoning Step: 3 

20 Final Results 

bacterial membrane Certainty=0 . 1510 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

25 SEQ ID 8638 (GBS338) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 62 (lane 5; MW 30kDa). It was also expressed in E.coli as a GST-fasion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 68 (lane 11; MW 55kDa). 

GBS338-GST was purified as shown in Figure 215, lane 3. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
30 vaccines or diagnostics. 

Example 680 

A DNA sequence (GBSx0722) was identified in S.agalactiae <SEQ ID 2095> which encodes the amino 
acid sequence <SEQ ID 2096>. Analysis of this protein sequence reveals the following: 

Possible site: 14 
35 >>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 6082 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

40 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
45 vaccines or diagnostics. 

Example 681 

A DNA sequence (GBSx0723) was identified in S.agalactiae <SEQ ID 2097> which encodes the amino 
acid sequence <SEQ ID 2098>. Analysis of this protein sequence reveals the following: 

Possible site: 30 
50 »> Seems to have a cleavable N-term signal seq. 
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Likelihood = - 
Likelihood = - 
Likelihood = ■ 
Likelihood = - 
Likelihood = - 



Transmembrane 126 - 142 

Transmembrane 45 - 61 

Transmembrane 241 - 257 

Transmembrane 199 - 215 

Transmembrane 96 - 112 



236 - 257) 



- Certainty=0. 494 9 (Affirmative) ■ 

bacterial outside Certainty=0 . 0000 (Not Clear) < i 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < t 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAG28337 GB.-U88582 SatE [Streptococcus mutans] 
Identities = 54/103 (52%) , Positives = 70/103 (67%) , Gaps = 2/103 (1%) 

Query: 1 MISDFLRDNPILTLLFCAHFLADFQWQSQSLADSKSHSWRGLWRHLLIVFLPLAALMILI 60 

+IS FL NP+LTLL AHFLADFQWQSQ +AD KS +W L RHL+IV LPL L ++I 
Sbjct: 6 VISQFLSGNPVLTLLLIAHFLADFQWQSQKMADLKSSNWTYLIRHLIIVALPLILLSWI 65 

Query: 61 PETTLLNLSIWGSHIVIDSIKKLSYPWVEEGHF--QKaAFIID 101 

P + L+ 1+ SH++IDS K L + ++ F KA F+ID 
Sbjct: 66 PHSFLVLSLIFLSHVLIDSGK1LLNSFYKDRSFIKTKAVFLID 108 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2099> which encodes the amino acid 
sequence <SEQ ID 210O. Analysis of this protein sequence reveals the following: 



INTEGRAL 
INTEGRAL 
INTEGRAL 



16 

i uncleavable N-term signal seq 



Likelihood = 
Likelihood 
Likelihood 
Likelihood 
Likelihood = -0. 



Transmembrane 125 - 141 ( 120 - 144) 

Transmembrane 222 - 238 ( 215 - 238) 

Transmembrane 47 - 63 ( 45 - 77) 

Transmembrane 179 - 195 ( 178 - 199) 

Transmembrane 67 - 83 ( 67 - 83) 



■ Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



• Certainty=0 .4036 (Affirmative) < suco 
■ Certainty=0 . 0000 (Not Clear) < suco 
- Certainty=0. 0000 (Not Clear) < suco 



The protein has no significant homology with any sequences in the GENPEPT database. 
40 An alignment of the GAS and GBS proteins is shown below: 

Identities = 109/256 (42%), Positives = 146/256 (56%), Gaps = 28/256 (10%) 

ISDFLRDNPILTLLFCAHFLADFQWQSQSLADSKSHSVIRC-LWRHLLIVFLPLAALMILIP SI 
+S +L P LTL H L+D+Q QSQ +AD K L HL+ V +PL L ++IP 

VSHYLAQTPTLTLFLICHVLSDYQLQSQQVADLKEKHLTYLGYHLIGVSIPLICLTLIIP 54 

ETTLLNLSIWGSHIVTDSIKKL SYPWVEEGHFQKAAFIIDQLAHYTCIIVFYHALPT 118 

+ L++L + SH +ID +K S W E F++DQ H L 

QAWLMSLLVMISHALIDWLKPKMANSLKWKREW I FLLDQCLHIAI SSFAGLRLAG 119 



Query: 


2 


Sb j ct : 


5 




62 


Sbjct: 


65 


Query: 


119 


Sb j ct : 




Query: 


179 


Sbjct: 




Query: 


239 


Sbjct: 


220 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 682 

A DNA sequence (GBSx0724) was identified in S.agalactiae <SEQ ID 2101> which encodes the amino 

acid sequence <SEQ ID 2102>. Analysis of this protein sequence reveals the following: 

Possible site: 30 

»> May be a lipoprotein 

Final Results 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAD17886 GB:AF100456 hyaluronate-associated protein precursor 
[Streptococcus equi] 
Identities = 358/521 (68%) , Positives = 426/521 (81%) , Gaps = 2/521 (0%) 

Query: 1 MSSETOKKLKFLGISIATLTATTVTLVACGNESKNSGDNK^-INWYIPTEISTLDISKNT 59 

M+ K K LG++ TL A+ L+ACGN+ SDK INWY PTEI TLDISKNT 
Sbjct: 1 MTVLGTKACKRLGLAAVTL - ASVAALMACGNKQSASTDKKSE INWYTPTE I ITLDI SKNT 59 

Query: 60 DAYSNLAIGNSGSNLLRIDKEGKPKPDLAKKVSVSSDGLTYTATLRDNLKWSDGSKLSAE 119 

D YS LAIGNSGSNLLR D +GK +PDLA+KV VS DGLTYTATLRD LKWSDGS L+AE 
Sbjct: 60 DTYSAIjAIGNSGSNLLRADAKGKLQPDIAEKVDVSEDGLTYTATLRDGLKl'ISDGSDLTAE 119 

Query: 120 DFWTWRRIVDPKTASEYAYIATESHLLNADKINSGDIKDIjNKLGOTAKGNQvTFKLTSP 179 

DFVY+W+R+VDPKTASEYAYIATESHL NA+ INSG DL+ LGV A GN+V F LT P 
Sbjct: 120 DFWSWQRlWDPKTASEYAYrj,TESHLKNAEDINSGKNPDIJ5SLGVKADGNKVIFTLTEP 179 

Query: 180 CPQFKYYJ^FSNFMPQKQSYVEKVGKDYGTTSKNQIYSGPYLVKDWNGSNGKFKLVKNKY 239 

PQFK L+FSNF+PQK+S+V+ GKDYGTTS+ QIYSGPY+VKDWNG++G FKLVKNK 
Sbjct: 180 APQFJ«LLSFSNPVPQKESFVKDAGKDYGTTSEKQIYSGPYIVKDi™GTSGTFKLVKM<N 23 9 

Query: 240 YWDSKHVKTNSVIVQTIKKPDTAVQMYKQGQIDFAEISGTSAIYQANKNNKDWDASDAR 299 

YWD+K+VKT +V VQT+KKPDTAVQMYKQG++DFA ISGTSAIY ANK +KDW +A 
Sbjct: 240 YWDAKNWTETVOTQTVKKPDTAVQmKQGKI.DFANISGTSAIYNANmi 299 

Query: 300 TTYIIYNQTGSVKALTNQKIRQAimATDRKGVVKAAVDTGSTPAESLVPKKIAKLPNGE 359 

T YI+YNQTG+++ h + KIROALNLATDRKG+V AAVDTGS PA +LVP LAKL +G 
Sbjct: 300 TAYIVYNQTGAIEGLNSLKIRQALNLATDRKGIVSA^VDTGSKPATALVPTGLAKLSDGT 359 

Query: 360 DLSKYTAPGYTYNTSKAQKLFKEGLAEVGQSSLKLTITADSDSPAAKNAVDYVKSTWESA 419 

DL+++ APGY Y+ +A KLFKEGLAE+G+ +L +TITAD+D+PAAK+AVDY+K TWE+A 
Sbjct: 360 DLTEHVAPGYICYDDimAKLFKEGLAELGKDALTITlTADADAPAAKSAVDYiraTWETA 419 

Query: 420 LPGLTVEEKFVTFKQRLEDAKl'IENFDVVLFSWGGDYPEGSTFYGLFTTNSAYNYGKFSSK 479 

LPGLTVEEKFV FKQRLED KN+NF+V + WGGDYP+GSTFYGLF + SAYNYGKF++ 
Sbjct: 420 LPGLTVEEKFVPFKQRLEDTKNQNFEVAVVLWG3DYPKGSTFYGLFKSGSAYNYGKFTNA 479 

Query: 480 EYDNAYQKAITTDALKPGDAANDYKTAEKALFDQSYYNPVY 520 

+YD AY KA+TTDAL A&+DYK AEKAL+D + YNP+Y 
Sbjct: 480 DYDAAYNKALTTDAENTDAAADDYKAAEKALYDNALYNPLY 520 

There is also homology to SEQ ID 318. An alignment of the GAS and GBS proteins is shown below: 

Identities = 138/524 (26%), Positives = 222/524 (42%), Gaps = 73/524 (13%) 

Query: 7 KKLKFLG-ISIATLTATTWLVACGNESKNSGDN--KVXNWYIPTEISTLD1SKNTDAYS 63 

KK K+L 4S+A L+ + L ACGN++ + G K + + +LD + 
Sbjct: 5 KKSKWLAAVSVA1LSVSA--LAACGNKNASGGSEATKTYKWFVNDPKSLDYILTNGGGT 62 
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Query: 64 NLAIGNSGSNLLRIDKEGKPKPDLAKICVSVS SDGLTYTATLRDNLKW- - SDGSK- - -LSA 118 

I LL D+ G P LAK VS DGLTYT TLRD + W +DG + 4-4-A 

Sbjct: 63 TDVITQMVDGLLENDEYGNLVPSLAKDWKVSKDGLTYTYTLRDGVSWYTADGEEYAPVTA 122 

Query: 119 EDFVYTWRRIVDPKTASEYAYIATESHLIJJADKINSGDIKllLNKLGVTAKGNQ-VTFBCLT 177 

EDFV + VDK+ + Y E+N +G++ D ++GV A ++ V + L 

Sbjct: 123 EDFVTGLKHAVDDKSDALY---WEDSIKNI.KAYQNGEV-DFKEVGVKALDDKTVQYTLN 178 

Query: 178 SPCPQFKYYLAFSNFMPQKQSYVEKVGKDYGTTSKNQI-YSGPYLVKDWNGSNGKFKLVK 236 

P + +S P +++ GKD+GTT + I+GY+ + S + K 

Sbjct: 179 KPESYWNSKTTYSVLFPVNAKFLKSKGKDFGTTDPSSILVNGAYFLSAFT-SKSSMEFHK 237 

Query: 237 NKYYWDSKHWTNSV--IVQTIKKPDTAVQMYKQGQIDFAEISGTSAIYQ-ANKNNKDW 293 

N+ YWD+K+V SV P + + + +G+ A + Y+ A KN D + 

Sbjct: 238 NENYWDAKWGIESVKLTYSDGSDPGSF\™FDKGEFSVAELYPNDPTYKSAKKNYADNI 297 

Query: 294 D ASDARTTYI I YN QTGSVKALTNQKIRQALNLATDRKG--- 331 

D R ++ +N Q KAL N+ RQA+ A DR 

Sbjct: 298 TYGMLTGDIR- -HLTWMJlffiTSFKNTKKDPAQQDAGKXALiNNKDFRQAIQFAFDRASFQA 355 

Query: 332 - WKAAVDTGSTPAESLVPKKLAKL - PNGEDLSKYTAPGYTYNTS 374 

V V G + S V K++AKL +D++ A YN 
Sbjct: 356 QTAGQDAKTKALRNMLVPPTFVTIGESDFGSEVEKEMAKLGDEWKDVNLADAQDGFYNPE 415 

Query: 375 KAQKLF KEGLAEVGQS-SLKLTITADSDSPAAKNAVDYVKSTWESALPGLTV 425 

KA+FKELG+++L D+A K + E++L V 

Sbjct: 416 KAKT^EFAKAKETALTAFXSVTFPVQLDYPVDQANAATVQERQSFKQSVEASLGKENVIVNVL 475 

Query: 426 EEKFVTFKQR LEDAKNENFDWLFSWGGDYPEGSTFYGLFT 466 

E + T + + E + +++D++ WG DY + T+ + + 
Sbjct: 476 ETETSTHEAQGFYAETPEQQDYDIISSWWGPDYQDPRTYLDIMS 519 

SEQ ID 2102 (GBS323) was expressed in E.coli as a His-rusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 62 (lane 4; MW 61.3kDa). 

The GBS323-His fusion product was purified (Figure 209, lane 5) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 306), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 683 

A DNA sequence (GBSx0725) was identified in S.agalactiae <SEQ ID 2103> which encodes the amino 
acid sequence <SEQ ID 2104>. Analysis of this protein sequence reveals the following: 

Possible site: 60 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.54 Transmembrane 199 - 215 ( 198 - 215) 

Final Results 

bacterial membrane Certainty=0. 1617 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC17173 GB:AF065141 unknown [Streptococcus nutans] 
Identities = 304/356 (85%) , Positives = 334/356 (93%) 

Query: 1 MKRELLLEKIDELKEIMPWYVLEYYQSKLSVPYSFTTLYEYLKEYRRFLEWLLDSGVANC 60 

M+RELLLEKIDELKE+MPWYVLEYYQSKL+VPYSFTTLYEYLKEYRRF EWL+DSGV+N 
Sbjct: 1 MRRELLLEKIDELKELMPWYVLEYYQSKLTVPYSFTTLYEYLKEYRRFFEWLIDSGVSNA 60 
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sbj. 

Sb j Ct ; 
Query 
Sbj Ct : 

Sbjct: 

Sbjct 



61 HHIAEIELSVLENLTKKDMEAFILYLRERPLLNaNTRQNGVSQTTINRTLSALSSLFKYL 120 

+ 4A+I L LE+L+KKDME+FILYLRER LLN ++ GVSQTTINRTLSALSSL+KYL 
61 NKLADIPLETLEHLSKIODMESFILYLRERTLLNTIOTKRQGVSQT'riNRTLSALSSLYKYL 120 



181 KLSKRMAFFNKNKERDIAIIALLLASGWLSEAVNLDLKDINLNVIW^ 240 

KLSKRAL+ F KNKERDLAI IALLIASGVRLSEAVNLDLKD+NLN+M+I4VTRKGGK DS 
181 KLSKRALSSFRKNKERDLAI IALLLASGVRLSEAWLDLKDVNLNMMI IEVTRKGGKHDS 240 

241 VNVASFAKPYIiMWLDIRKWYKAENQDIAiFLSEYRGVPNRIDASSVEKMVAKYSQDFK 3 00 

VNVA FAKPYL NY+ IR+ RYKA+ D+A FLSEYRGVPNR+DASS+EKMVAKYSQDFK 
241 V1WAGFAKPYLENYITIRRGRYKAKKTDIAFFLSEYRGVPNRMDASSIEKMVABCYSQDFK 300 

301 TOVTPHKLRHTLATRLYDATKSQVLVSHQLGHASTQVTDLYTHIVNDEQKNALDKL 356 

+RVTPHKLRHTLATRLYDATKSQVLVSHQLGHASTQVTDLYTHIVNDEQKNALDKL 
301 IRVTPHKLRHTLATRLYDATKSQVLVSHQLGHASTQVTDLYTHIVNDEQKNALDKL 356 



A related DNA sequence was identified in S.pyogenes <SEQ ID 2105> which encodes the a 
sequence <SEQ ID 2106>. Analysis of this protein sequence reveals the following: 

Possible site: 48 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.54 Transmembrane 211 - 227 ( 210 - 227) 



• Final Results 

bacterial membrane - 

bacterial outside - 

bacterial cytoplasm - 



■ - Certainty=0 . 1617 (Af f ii 
•- Certainty=0. 0000 (Not Clear) ■ 
•- Certainty=0. 0000 (Not Clear) ■ 



A related sequence was also identified in GAS <SEQ ID 9139> which encodes the amino acid 
<SEQ ID 9140>. Analysis of this protein sequence reveals the following: 



Possible cleavage site: 60 
Seems to have no N-terminal signal sequence 
INTEGRAL Likelihood = -1.5' 



• 215 ( 198 - 



Final Results 

bacterial membrane Certainty= 0.162 (Affirmative) • 

bacterial outside Certainty= 0.000 (Not Clear) < i 

bacterial cytoplasm Certainty= 0.000 (Not Clear) < £ 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 283/356 (79%) , Positives = 321/356 (89%) 







Sbjct: 


13 


Query: 


61 


Sbjct: 


73 




121 


Sbjct: 


133 




181 


Sbjct: 


193 


Query: 


241 



IA+I+LS LE+LTKKD+EAF+LYLRERP IN 



^ G+SQTTINRTLSALSSL+KYL 



TEEVEN GEPYFYRNVMKKVSTKKKKETLASRAENIKQKLFLG+ET+ FL+Y+D EY+ 
TEEVENDCGEPYFYRNVMKKVSTKKKKETI^ASRAENIKQKLFLGDETLAFLDYVDKEYEQ 192 

KLSKRAIAFFNKNKERDIAIIALLIiASGWLSEAVNTjDLTO 240 
KLS RA + F KNKERDLAI I ALLLASGVRLSEAVNLDLKD+NLN+M+ 1 +V RKGGKRDS 
KLSNRAKSSFRKNKERDLAIIALLIASGVRLSEA^T^DLKDVNLNMMIIEVIRKGGKRDS 252 

241 VNVASFAKPYIiANYLDIRKNRYKAF^QDIALFLSSYRGVPNRIDASSVEKWAKYSQDFK 300 



WO 02/34771 



PCT/GB01/04789 



-779- 

VNVA FAK YL +YL +R+ RYKAE QD+A FL+EYRGVPNR+DASS+EKMV KYS+DFK 
Sbjct: 253 V1STVAGFAKGYLESYIATOQRRYIOVEKQDIAFFLTSYRGVPNRMDASSIEKMVGKYSEDFK 312 

Query: 3 01 VRVTPHKLRHTLATRLYDATKSQVLVSHQLGHASTQVTDLYTHIVNDEQKNALDKL 356 
5 +RVTPHKLRHTLATRLYDATKSQVLVSHQLGH+STQVTDLYTHIVNDEQKNALD L 

Sbjct: 313 IRVTPHKLRHTLATRLYDATKSQVLVSHQLGHSSTQOTDLYTHIVNDEQKNALDNL 368 

SEQ ID 2104 (GBS420) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 1 72 (lane 5; MW 68kDa). 

10 GBS420-GST was purified as shown in Figure 219, lane 9-10. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 684 

A DNA sequence (GBSx0726) was identified in S.agalactiae <SEQ ID 2107> which encodes the amino 
15 acid sequence <SEQ ID 2108>. This protein is predicted to be a sensor-like histidine kinase in idh 3'region. 
Analysis of this protein sequence reveals the following: 

Possible site: 24 

>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -7.75 Transmembrane 10 - 26 ( 8 - 34) 
20 INTEGRAL Likelihood = -3.93 Transmembrane 37 - 53 ( 35 - 54) 

Final Results 

bacterial membrane — Certalnty=0. 4100 (Affirmative) < suco> 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

25 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB16001 GB:Z99124 similar to two-component sensor histidine 
kinase [YxdJ] [Bacillus subtilis] 
30 Identities = 96/320 (30%), Positives = 172/320 (53%), Gaps = 15/320 (5%) 



IRQFLREHLIWYILYIM- -MFVLFFISFYLYHLPMPYLFNSLGLNVIVLLGISIWQYSRY E 
++ FLR H + 4L+++ +FV F+ F H +LF LG+ +++L G +++ + 
MKLFLRSHAVLILLFLLQGLFVFFYYWFAGLH-SFSHLFYILGVQLLILAGYLAYRWYKD 5 



40 Query: 118 WSHQMKVPLAAISLMAQTNHLDP- -KEVEQQLLKLQHYLETLLAFLKFRQYRDDFRFEAV 175 

W HQ+K PL+ I+L+ Q +P +++++++ +++ LETLL + + DF+ EAV 
Sbjct: 119 WVHQVKTPLSVINLIIQEED-EPVFEQIKKEVRQIEFGLETLLYSSRLDLFERDFKIEAV 177 

Query: 176 SLREVWEIIKSYKVICLSKSL- - SI I IEGDNIWKTDKKWLTFALSQVLDNAIKYSNPES 233 
45 SL E++ +I+SYK + + + + D+ TD KWL FA+ QV+ NA+KYS +S 

Sbjct: 178 SLSELLQSVIQSYKRFFIQYRVYPKMNVCDDHQIYTDAKOTLKFAIGQWTNAVKYSAGKS 237 

Query: 234 KIIISIGEESIRIQDYGIGILESDIPRLFEDGFTGYNGHEHQKATGMGLYMTKEV 288 

+ + ++DYG+GI +DI R+F4 +TG NG Q++TG+GL++ KE+ 

50 Sbjct: 238 DRLEIOTFCDEDRTVLEVKDYGVGIPSQDIKRVFDPYYTGENGRRFQESTGIGLHLVKEI 297 



: 289 LSSLNLSISVDSKINYGTAV 308 

LN ++ + S GT+V 
: 298 TDKLNHTVDISSSPGEGTSV 317 



SEQ ID 2108 (GBS421) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 172 (lane 6; MW 63kDa). 
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GBS421-GST was purified as shown in Figure 219, lane 1 1. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 685 

A DNA sequence (GBSx0727) was identified in S.agalactiae <SEQ ID 2111> which encodes the amino 
acid sequence <SEQ ID 2112>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0. 1310 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAD10258 GB:AF036964 putative response regulator [Lactobacillus 

Identities = 94/222 (42%) , Positives = 140/222 (62%) , Gaps = 8/222 (3%) 

KIYIVEDDMTIVSLLKDHLSASYHVSSV--SNPRDVKQEIIAFQPDLILMDITLPYFNGF 64 
+1 IVEDD TI +L+ ++L + + ++ +F + + +P L+L+DI LP ++GF 
EIMI VEDDPTIANLIAENLE - KWQLKAI IPDDFDTIFDRFLTDKPHLVLLDINLPVYDGF 6 1 

'LTIPII FISSSNDEMDMVMRLNMGGDDFISKPFSLAVLDAKLTAILRRSQQ 124 

+PIIFISS + MDMVM++NMGGDDF++KPFS+ VL AK+ A+LRR+ 
'SKVPIIFISSRSTNTOMVMSI^GGDDFVNKPFSMEVLIAKINALLRRTYN 121 

■TFGGFTLT-REGLLSSQDKEVILSPTENKILSILLMHPKQWSKESLLEKli 180 
G + + G D V LS E K+L L+ Q+VS+E LL L 



W+++ F+D NTL VN+ RLRKKI G DYI T G GY++ 
WDDERFVDDNTLTVNINRLRKKIEQAGLEDYIQTKIGQGYII 223 

There is also homology to SEQ ID 1 1 82. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



Sbjct: 


3 




65 


Sbjct: 


62 




125 


Sbjct: 


122 




181 


Sbjct: 


182 



Example 686 

40 A DNA sequence (GBSx0728) was identified in S.agalactiae <SEQ ID 2113> which encodes the amino 
acid sequence <SEQ ID 2114>. This protein is predicted to be permease OrfY. Analysis of this protein 
sequence reveals the following: 

Possible site: 37 

>» Seems to have no N-terminal signal sequence 

45 INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

50 INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 

INTEGRAL Likelihood = 



55 - 71 ( 49 - 75) 

Transmembrane 197 - 213 ( 192 - 218) 

Transmembrane 152 - 168 ( 141 - 172) 

Transmembrane 624 - 640 ( 619 - 645) 

Transmembrane 222 - 238 ( 219 - 250) 

Transmembrane 283 - 299 ( 280 - 307) 

Transmembrane 533 - 549 ( 526 - 552) 

Transmembrane 108 - 124 ( 99 - 140) 

Transmembrane 585 - 601 ( 581 - 610) 

Transmembrane 25 - 41 ( 21 - 47) 
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I1JTEGRAL Likelihood = -0.48 Transmembrane 602 - 618 ( 602 - 618) 

Final Results 

bacterial membrane -— Certainty=0 . 5649 {Affirmative) < suco 

5 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0.0000 {Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9809> which encodes amino acid sequence <SEQ ID 9810> 
was also identified. 

. 10 The protein has homology with the following sequences in the GENPEPT database: 

40/665 (6%) 

Query: 4 MPYLKIAWHNLKHSIDQYIPFLLASLLLYSLTCSTLLILMSAVGRDMGTAAT VLFLG 60 

MF KI++HNL + +P4 + + L + ++ TA +L G 

Sbjct: 1 MFLPKISFHNLIVNKSLTLPYFAIMTIFSGFJIYVLINFLTNPSFYNIPTARILIDILIFG 60 

Query: 61 VIVLS I FAVVMEHYSYNILMKQRSSEFGLYNILGMNKRQVARVASLELFI I YI FLI S1GS 120 

I++S+ ++ Y+ + +R+S G44 +LGM K+Q+ ++ LE ++ G 
Sbjct: 61 FlLISLLMLLYGRYAHRFISDERNSNMGIFLMLGMGKKQLIiKIIYLEKLYLFTGTFFGGIj 120 

Query: 121 LFSAFFAKFIYLIFVNIINYHALNLSLSLWPFIICIVIFTGIFLTLEVPVIRHVHLSSPL 180 

+F ++K 4L N+I + SL 444 14 4 4 R 4 S 

Sbjct: 121 IFGFVYSKIFFLFIRNLIVIGDVREQYSLTAISWLLILTFFIYFIIYLSEYRLLKRQSIT 180 

Query: 181 SLFRKKQQGEKEPKGNLILAILALVAIAIAYTMALTSGKAPALAVIY-RFFFAVLLVIAG 239 

+F K++ K++ + + LA++Y ALTS P + + RF +A LV G 
Sbjct: 181 VI FNSKAKRDNPRKTSVFVGLFGLFALLMGYHFALTS PNVTTSFSRFI YAACLVTLG 237 

Query: 240 TYLFYISFMTWYLKRLRQNKHYYYKSEHFVSTSQMIFRMKQNAVGLASITLLAVMALVTI 299 

+ + S + L 444 4 YY FV 4 4 R44 NA4 LA4I 4 + LV44 

Sbjct: 238 IFCTFSSGVINLLTVIKKRRAIYYNQRRFWIASLFHRIRSKALSLATICIFSTATLVSL 297 

Query: 300 ATWSLYSNTQNWTGLFPKSVSLSIDNSKGDAKNIFEEKILKKLGKSSKEAITYNQTMI 359 

4 . SLY N4V P4 V44 S D E L 4 4 4T Q 
Sbjct: 298 SVIASLYIiAKDN^LSSPRDVTVL---STTDI EPNLMDIATKNHVTLTNRQ 346 

Query: 360 SMPVSQSSELNITSKNVKHVDITKTGFMY LITQNDFRRLGHQLPKLKDNQVAYF 413 

4+ VSQS NI H4 4 G M 414 4 F 4 4LK4444 4 

Sbjct: 347 NLKVSQSVYGNI KGS HLS VDPNGGMAKDYQITVI SLDSFNASNNTHYRLKNHE ILTY 403 

Query: 414 VQKGDSRLKKINLLGNKFDWKNLKEA- YVPETTNTYNPGLI I FANNKQI -DNIRKAYLP 471 

V G 4 GKVK4K 44 4PI 4N44I I K L 

Sbjct: 404 VSNGAAAPSSYTTNGVKLTNVKQI KRINF I FSPLRSMQPNFFI ITDNREI IQTILKEELT 463 

Query: 472 YTKNINTFPKTFKAYLDLNSQEINSISKKDIIEVDG- -KYVGNISTKQSFLKEGYQMFGG 529 

4 T Y + +44N D 4E 44 N4 + + 4FGG 

Sbjct: 464 WG TF4AGY - HVKGKKMNQKDF YDELETTNFRQFSA1TWS IRQVKSMFNALFGG 514 

Query: 530 LLFTGFLLGISFLLGIALIVYYKQYSEGHEDKRSYRILQEVGMSKKLVKRTINSQIMIFF 589 

LLF G 4 G F 4 A4 4YY4Q SEG D4 Y4 4 44GM4 K 4+ 4l QI F 
Sbjct: 515 LLFVGIIFGTIFAILTAITIYYQQLSEGIRDRDDYKAMIKLGMTNKTIQDSIKVQINFVF 574 

Query: 590 FQPLWAVIEFGVAIPMLKQMDLVFGVmSTIVYWSGLTVLAISIIYFIlYRITSRTYY 649 

P4 A+44 A4P4L 444 FG 44 4 G 44 Y+ I TS4 YY 

Sbjct: 575 ILPIAFALLNLIFALPILYKIMTTFGFMDAGLFTjRAVGTCLIVYLFFYWFICHCTSKLYY 534 

Query: 650 HIIER 654 
41 4 

Sbjct: 635 RLISK 639 



A related DNA sequence was identified in S.pyogenes <SEQ ID 2115> which encodes the amino acid 
sequence <SEQ ID 2116>. Analysis of this protein sequence reveals the following: 
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Likelihoad =-13. 
Likelihood =-12. 
Likelihood =-12. 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



95 Transmembrane 109 



• 618 ( 592 • 

■ 75 ( 50 - 

■ 251 ( 224 • 
175 ( 146 - 

■ 217 ( 198 ■ 

• 526 ( 507 - 

■ 585 ( 564 ■ 

• 125 ( 102 • 

■ 310 ( 290 - 

■ 142 ( 126 - 



262) 
177) 
223) 
54 0) 
589) 
138) 
315) 
142) 



• Final Results - 

bacterial r 
bacterial outside - 
bacterial cytoplasm - 



-- Certainty=0. 6434 (Affirmative) • 
■- Certainty=0. 0000 (Not Clear) < I 
- Certainty=0. 0000 (Mot Clear) < i 



The protein has homology with the following sequences in the databases: 

>GP:BAB03337 GB:AB035452 ABC transporter [Staphylococcus aureus] 
Identities = 141/657 (21%) , Positives = 289/657 (43%) , Gaps = 56/657 (10%) 

I N+++N Y +Y L S+F + + + S +T+ + + +1 G+L 

Sbjct: 6 IVFKNLRQNLKHYAMY LFSLFFSIVLYFSFTTLQFTKGVHNDDSMAIIKKGALV 59 

Query: 62 - - IFLIVFLWFLI YFNNFFVKKRSQELGVIAILGFSKRELTKLLTLENLVILVLSYLVS 119 

IFL + +V+FL+Y N+ FVK+R++E + ++G +++ + K+L LE +++ +++ +V 
Sbjct: 60 GSIFLFIIIVIFLMYANHLFVKRRTREFALFQLIGLTRQNILKMLALEQMIVFLITGWG 119 

Query: 120 LLLGPTLYFIiAVIAITHLLNLTMEVQWFITVNEIlESLGILVWFLINVITNGLIISKQS 179 

+L G L + ++ L++L++ + ++ ++ +L++ +++ + + L + ++S 

Sbjct: 120 VLCGIAGAQLLLSIVSKLMSLSINLSIHFEPMALVLTIFMLIIAYVLILFQSALFLKRRS 179 

Query: 180 LIEFVNFSRKAE KK1KIRKVRAIIAITALLLSYILCLATVFSSTRNMLLSIGMVPV 235 

++ + SK+ K +++I+LY +AT T L P 

Sbjct: 180 ILSMMKDSIKTDATTAKVTTAEVISGVLGIAMIALGYY- -MATEMFGTFKALTMAMTSP- 236 

Query: 236 SLLI IVLWLGTVFTIRYGLAFWSLLKENKKRLYRPLSNI IYPKFNYRIATKNKLLTVL 295 

+1+ L V+G R ++ + LK++K + YR+ LT++ 

Sbjct: 237 -FIILFLTWGAYLFFRSSVSLIFKTLKKSKNGRVSITDWFTSSIMYRMKKNAMSLTII 295 

Query: 296 GGLLTVTV'SVAGMMVMLYAYSLNGIERLTPSMEYNVESENGQVNVTTILENDQVSL 352 

+ VTV+V + + 4- + + P+ E+NV + T L Q++ 

Sbjct: 296 AI ISAVTVTVLCFAALSKSNTDQTLTSMAPN - - EFNWATQDAKQFETKLSQQQITFSKN 353 

Query: 353 VDVGLLRLNTIPEVTITDSGQTIPYFDIINYSDYKELMKAQGRTNSIEGSKSLPLL 408 

+ V ++ I +DSG+T N KG I +KSLP + 

Sbjct: 354 AYETITVDNVKDQVITLENGSDSGRTNSILSANN- - - KVTGNNAI ITNTKSLPNI 405 

Query: 409 INYYPTEISLGKTFNLGNAYDVT- -VKQVSTNNVFSFSTSVTTLV- -VSDKLYAKLSSRF 464 

IN ILK + +TVQ V++S+VVS + Y+L + 

Sbjct: 406 IN IHLNKDLWKGTKNETFRVTQEDKGRVYPLNLSFNSPVVEVSPEKYQQLKT-- 458 

Query: 465 PEKEMTIRTFNGTSIR -SSERFYNQFSMVPDVISSYSKEHTVKTANIATYIFIT- 517 

+ + TF G 1+ ++A QF D + +Y + A IF+T 

Sbjct: 459 QHNVHTFYGYDIKQTSQKEKAQAIAKQFG DKVITYDEMKKEVDATNGILIFVTS 512 

Query: 518 FLSILFIICTGSILYFTSLIEIMENKEEYGYLSKLGYSKKMIHRILRYETGILFLIPVFI 577 

FL + F++ G 1+Y + E + + L ++G++ + + L + F +P+ I 
Sbjct: 513 FLGLAFLVAAGCIIYIKQMDETEDELSNFRILKRIGFTHTDMLKGLLLKITFNFGLPLLI 572 



Query: 
Sbjct: 



578 GIVNGGMLLIYYKYLFMDTLVAGNIIMLSLLLCLLFFLIIYGTFYVLTLRLVTSIIK 634 

I++ I + L GNI + +++ ++ + +IY TF ++ +IK 

573 AILHAVFARIAFMKLM GNISFMPVIWIWYTLIYITFALIAFVHSNKLIK 623 
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An alignment of the GAS and GBS proteins is shown below: 

Identities = 145/678 (21%) , Positives = 277/678 (40%) , Gaps = 89/578 (13%) 

NLKHSIDQYIPFLLASLLLYSLTCSTL LILMSAVGRDMGTAATVLFLGVIVLSIF 57 

N+K + Y + LA++ L S+ + L 1+ +G D G A + +1 L +F 

:GLLSIFIAFIiNFISDKIITEKIG-DSGQALVIANGSLIFLIVF 57 



W Y N +K+RS E G+ ILG +KR++ ++ +LE +1 -t 







Sbj ct: 


9 


Query: 


58 


Sbjct: 


68 


Query: 


128 


Sbjct: 


124 


Query: 


182 


Sbjct: 


183 


Query: 


236 




243 


Query: 


296 


Sbjct: 


300 






Sbjct: 


345 




403 


Sbjct: 


400 


Query: 


463 


Sbjct- 






516 


Sbjct: 


513 


Query: 


576 


Sbjct: 


558 




636 


Sbj ct : 


617 



--HLSSPLS 181 



h AI+A+ A+ ++Y -t 



NVTTI 344 

TYNQTMISMPVSQSSELNITSKNVKHVDITKTG FMYLITQNDFRRL GHQL 402 

N + + V + + V IT +G + +1 +D++ L + + 

LENDQVSLVDVGL LELNTIPEVTITDSGQTIPYFDIINYSDYKELMKAQGRTNSI 399 



E E+K Y L ++G SKK 



IF P+ + +++ G+ 4 



IY Y +T R 



A related GBS gene <SEQ ID 8639> and protein <SEQ ID 8640> were also identified. Analysis of this 
protein sequence reveals the following: 



Lipop: Possible site: -1 Crend: 
McG: Discrim Score: -11.64 
GvH: Signal Score (-7.5): -3.52 

Possible site: 37 
>>> Seems to have no N-terminal signal sequence 



ALOM program 
INTEGRAL 
INTEGRAL 



Likelihood =- 
Likelihood =- 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 



-11.62 threshold: 0.0 



62 



197 - 213 



624 - 640 
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INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood = 
Likelihood = 
Likelihood - 
Likelihood = 
Likelihood = 



Transmembrane 533 - E 

Transmembrane 108 - 2 

Transmembrane 25 - 

Transmembrane 602 - 6 
129 



- 552) 

- 140) 

- S10) 



* Reasoning Step: 3 



- Final Results 

bacterial membrane - 

bacterial outside - 

bacterial cytoplasm - 



- Certainty=0. 5649 (Affirmative) - 

- Certainty=0 . 0000 (Not Clear) < i 

- Certainty=0 . 0000 (Not Clear) < i 



The protein has homology with the following 

ORF02245(310 - 2262 of 2562) 

GP | 9802356 |gb| AAF99695 . 1 1 AF267498_5 | AF267498 (1 
mutans } 
%Match =10.2 

%Identity =24.0 %Similarity =49.8 
Matches = 147 Mismatches = 21 



639 of 640) permease OrfY {streptococcus 



QKTC*IYLKLLTWMDKLF*W*PIQQMLLVMPNAFYLSKMDVFFTNFIWIRIIANSIKIFL*QCLPY*GVNNMFYLKIAW 

II II- 
MFLPKISF 



HNLKHSIDQYIPFLLASLLLYSLTCSTLLILMSAVGRDMGTAAT---VLFLGVIVLSIFAWMEHYSYHILMKQRSSEFG 
III = s|:= : = I = = = II =1 =1 l==l== = = 1= = =1=1 I 

HNLIWKSLTLPYFAIMTIFSGFNYVLINFLTNPSFYNIPTARILIDILIFGFILISLLMLLYGRYANRPISDERNSMMG 



LYNILGIWKRQVARVASLELFIIYIFLISIGSLFSAFFAKFIYLIFVNIINYHALNLSLSLWPFIICIVIFTGIFLTLEV 
:= =111 l = |: = = II = = = I = I -I =1 1 = 1 = II = = = l = = = = 

IFLMLGMGKKQLLKIIYLEKLYLFTGTFFGGLIFGFVYSKIFFLFIRNLIVIGDVREQYSLTAISWLLILTFFIYFIIYL 



PVIRHVHLSSPLSLFRKKQQGEKEPKGNLI3AIIJMLVAIAIAYTMALTSGKAPALAVIY-RFFFAVLLVIAGTYLFYISF 

| : | :| | : : | :: : :: | |: : | |||] | : s || :| || | = : | 
SEYRLLKRQSITVIFNSKAKRDNPRKTSVFVGLFGLFALLMGYHFALTS - - - PNVTTSFSRFIYAACLvTLGIFCTFSSG 
180 190 200 210 220 230 240 

1071 1101 1131 1161 1191 1221 1251 1281 

MTWYLKRLRQNKHYYYKSEHFVSTSQMIFFJ^KQNAVGIASITLI^.V^^ 

: I ::: : II II : = l = = 11 = 11 = 1 = = = ll = = = II! 1 = 1 1= I 

VIMLLTVIE^CRPJ^IYYNQRRFWIASLFHRIRSNALSLATICIFSTATLVSLSVIiASLYIiAKDMWRLSSPRDV- 

260 270 280 290 300 310 

1311 1341 1371 1401 1431 1461 

SKGDAKNIFEEKILKKLGKSSKEAITYNQTMISMPVSQSSELNITSKN\TCH\TJITKTGFM 

l = :| = = -I =11 II =1 = 

TVLSTTDIEPNLMDIATKN- - HVTLTNRQNLKVSQS VYGNI KGSHLS VDPN 

320 330 340 350 360 

1464 1494 1524 1554 1584 1614 1641 1671 

YLITQNDFRRLGHQLPKLKDNQVAYFVQKGDSRLKKINLLGKKFD\ r VKNLKEA-YVPETTNTYNPGLIIFA 

:|= : I = :||::== =11= I 1= II =1 == ' I = I 

GGMANDYQITVISLDSFNASNlTTHYRLKNHEILr/VSNGAAAPSSYTINGVTOjTNTO 

380 390 400 410 420 430 440 

1698 1728 1758 1788 1818 1842 1872 1902 

NNKQI - DNIRKAYLPYTKNINTFPKTFKAYLDLNSQE I NS I S KND 1 1 E VDG - - KYVGNI STKQSFLKEGYQMFGGLLFTG 
:|::| | | | : 11 = = - = I I : I = = 1= » = =111111 I 
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DMREIIQTILKEELTWG -TMAGY-HVTCGfCKMNQKDFYDELHITCNFRCFSANWSIRQVKSMFKALFGGLLFVG 

450 470 480 490 500 510 

1932 1962 1992 2022 2052 2082 2112 2142 

FLLGISFLLGIALIVYYKQYSEGHEDKRSYRILQEVGMSKKLVKRTINSQIMIFFFQPLWAV1HFGVAIPMLKQMLLVF 
::| I = 1= =11 = 1 III I: h = = = lh I =: =1 II I h 1 = ::: hhl 1 
IIFGTIFAILTAITIYYQQLSEGIRDHDDYKAIviIKIGMIWKTIQDSIKVQINFVFILPIAFALLWLIFALPILyKIMTTF 
530 540 550 560 570 580 590 

2172 2202 2232 2262 2292 2322 2352 2382 

GVLNSTIVYWSGLTVLAISIIYFIIYRITSRTYYHIIER*KGLVILPILLH**KPID*KICYTK*KKEISYYFRRGYVT 
| :: : | :: |: | ||: || :| : 

GFNDAGLFLRftVGTCLIVYLFFYWFICHCTSKLYYRLISKK 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 687 

A DNA sequence (GBSx0729) was identified in S.agalactiae <SEQ ID 2117> which encodes the amino 
20 acid sequence <SEQ ID 211 8>. This protein is predicted to be ABC transporter OrfX. Analysis of this 
protein sequence reveals the following: 

Possible site: 58 

»> Seems to have no N-terminal signal sequence 

25 Final Results 

bacterial cytoplasm --- Certainty=0 .5121 (Affirmative) < succs. 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=O.ooO0 (Not Clear) < suco 

30 The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF99694 GB:AF267498 ABC transporter OrfX [Streptococcus mutans] 
Identities = 118/242 (48%) , Positives = 175/242 (71%) , Gaps = 1/242 (0%) 

Query: 5 INHLEKVFRTRFSKEETRALQDVDFKVEO^EFIAIMGESGSGKTTLLNILATLEKPTNGQ 64 
35 ++HL+KV++T+ AL+D+ F V++GEFIAIMGESGSGK+TLLNILA ++ P++G 

Sbjct: 6 VSHLKKVYKTQEGLTN-EALKDITFSVQEGEFIAIMGESGSGKSTBLNILACMDYPSSGH 64 

Query: 65 VIMGEDITKIKEAKLASFRLKNLGFVFQDFNLLDTLSVRDNIYLPLVLDRKRYKEMDHR 124 
+ 1 N + K+K+ + A FR +++GF+FQ+FNLL+ + +DN+ +P+++ + + R 
40 Sbjct: 65 IIFNNYQLEKVKDEEAAVFRSRHIGFIFQNFNLLNIFNNKDNLLIPVIISGSKVNSYEKR 124 

Query: 125 LSELSSHLRIDDLLDKRPFELSGGQKQRVAIARSLITNPQILLADEPTAALDYRNSEDLL 184 

L +L+4 + 1+ LL K P+ELSGGQ+QR+AIAR+LI NP ++LADEPT LD + S+ +L 
Sbjct: 125 LRDLAAWGIESLLSKYPYELSGGQQQRIiAIARALIMNPDLILADEPTGQLDSKTSQRIL 184 

45 

Query: 185 NLFETINLDGQTILMVTHSANAASHAKRVLF I KDGRI FHQLYRGNKNNSEFNKDISLTMS 244 

NL IN +TILMVTHS AAS+A RVLFIKDG IF+QL RG K+ F I + + 
Sbjct: 185 NLLSNINAKRKTIL^HSPI<AASTANRVLFIKDGVIFNQLVRGCKSREGFLDQIIMAQA 244 

50 Query: 245 AI 246 

Sbjct: 245 SL 246 

A related DNA sequence was identified in S. pyogenes <SEQ ID 21 19> which encodes the amino acid 
55 sequence <SEQ ID 2120>. Analysis of this protein sequence reveals the following: 

T-terminal signal sequence 



Final Results 
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bacterial cytoplasm --- Certainty=0. 2131 (Affirmative) < suco 
bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 91/222 (40%) , Positives = 142/222 (62%) , Gaps = 2/222 (0%) 

LLEINHLEKVFRTRFSKEETRALQDVDFK'/EOGSFIAIMGESGSGKTTLLNILATLEKPT SI 
LL + + K + EE L+ +D +V +G+F+AIMG SGSGK+TL+NI+ L+KP 

LLNLKD1RKSYH--LGTEEFAILKGIDLEVNEGDFLAIMGPSGSGKSTLMNIIGCLDKPG 58 



- +P ELSGGQZQRVAIAR+L+TNP +L DEPT ALD -t 



Query: 


2 


Sbjct: 


1 


Query: 




Sbjct: 


59 


Query: 


122 


Sbjct: 


119 


Query: 


182 


Sbjct: 


179 



+++LF+ N +G+TI+++TH A+ K+ + ++DG I 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 688 

A DNA sequence (GBSx0730) was identified in S.agalactiae <SEQ ID 2121> which encodes the amino 

acid sequence <SEQ ID 2122>. This protein is predicted to be nisin-resistance protein. Analysis of this 

protein sequence reveals the following: 

30 Possible site: 18 

»> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-13.16 Transmembrane 8 - 24 ( 1 - 31) 



Final Results 

35 bacterial membrane --- Certainty=0 .6265 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Nat Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 



Query: 3 RKIVLLFWPMLIVLGILGVWHYYGSALNIYLLPPSSERYGRVILDRVEQRGLYSQGRQ 62 

++I+L V + LGI ++++G NIYL+PPS ++Y RV h +++ GL++ ++ 
Sbjct: 5 KRILLGLVAVCALFLGI IYFWGYKFNIYLVPPSPQKYVRVALKNMDELGLFTDSKE 60 

Query: 63 WQIIRQRSEKKLKTSKSYQESRNIVQEATOYGGGKHSQILSKETVRRDTLDSRYPEYRRL 122 

W ++++ ++ +K+Y E+ +Q+A++ GGKHS I +E + + ++ 4- 
Sbjct: 61 WVETKKKTIEETSNAKNYAETIPFLQKAIKVAGGKHSFIEHEEDISKRSITKYIKPKAEI 120 

Query: 123 NEDILLITIPSISKLDKRSISHYSGKLiO^nLMEKSYKGIiILDLSNNTGGNMIPMIGGVAS 182 

+ L++TIP + D ++ S Y+ L++ + +Y G1-I+DL N GG++ PM+ G++ 
Sbjct: 121 EGNTLILTIPEFTGNDSQA-SDYANFLESSFHKnNYNGVIvDLRGNRGGDLSP^WLGLSP 179 



Query: 183 ILPNDTLFHYTDKYGNKKTITMKNIPLEALKISRKTINTKH 1 / 

+LP+ TLF Y DK + K + ++N + + S K + K + PIA++ ++ T SS E 
Sbjct: 180 LLPDGTLFTYVDKSSHSKPVELQNGEINSGGSSTKVSDNKKIKKAPIAVLIDNNTGSSGE 2 



240 MTFLSFKGLPNVKSFGQATAGYTTVNETFMLYDGARIiALTTGIVSDRQGYKYENTPILPD 299 
+T L FKG+PNVK G +AGYT+ N+T LYDG+ L +T+ V DR Y+N PI PD 
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Sbjct: 240 LTALCFKGIPNVKFLGSDSAGYTSANQTVYLYDGSTLQITSAFVKDRTNNIYKNFPISPD 299 



Query: 300 QVTSLPLQESQSWLKSRI 317 
T+ + W+KS+I 

5 Sbjct: 300 IQTNNAKSSAIEWIKSQI 317 



No corresponding DNA sequence was identified in S.pyogenes, 

A related GBS gene <SEQ ID 864 1> and protein <SEQ ID 8642> were also identified. Analysis of tl 
protein sequence reveals the following: 



Lipop: Possible site: -1 Crend: 3 
McG: Discrim Score: 12.71 
GvH : Signal Score (-7.5): -5.64 

Possible site: 18 
>» Seems to have an uncleavable N-term signal seq 
ALOM program count: 1 value: -13.16 threshold: 0. 
INTEGRAL Likelihood =-13.16 Transmembrane 
PERIPHERAL Likelihood = 4.03 174 
modified ALOM score: 3.13 

Step: 3 



- Certainty=0. 62 65 (Affirmative) 
bacterial outside --- Certainty=0 . 0000 (Not Clear) < : 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < : 



The protein has homology with the following sequences in the databases: 

34.7/62.5% over 311aa 

Lactococcus lactis 

30 GP| 805128 | nisin-resistance protein Insert characterized 

ORF01108 (343 - 1254 of 1560) 

GP|805128|gb|AAB08491.l| |U25181(7 - 318 of 318) nisin-resistance protein {Lactococcus 
lactis} 
35 %Match =19.4 

Srldentity =34.6 %Similarity =62.4 

Matches = 106 Mismatches = 112 Conservative Sub.s - 85 



231 261 291 321 351 393 423 

40 LKLSNL*EIGLKM*GYSKPFCHIIDLKRKGEQEMRRKIVLLFWPMLIVLGILGV WHYYGSALNIYLLPPSSE 

: |:||:: | :::::| :||||:||| : 

MKIGKRILLGLVAVCALFLGIIYFWGYKFNIYLVPPSPQ 
10 20 30 

45 453 483 513 543 573 603 633 663 

RYGRVILDRVEQRGLYSQGRQWQIIRQRSEKKLKTSKSYQESRNIVQEAVRYGGGKHSQILSKETVRRDTLDSRYPEYRR 
:| II I = = = ||:: ::| = = = = := =1=1 1= =l=l== lllll I :| = = == 
KYWVALKNI^ELGLFTDSKEWVETKKKTIEETSNAKiraAETIPFLQK^ 

50 60 70 80 90 100 110 

50 

693 , 723 753 783 813 843 873 903 

IMDiLLITIPSISKLDKRSISHYSGKLQNILMEKSYKGLILDLSLmTGGN^IIPMIGGVASILPNDTLFHYTDKYGNKKT 
: = |::||| = I :: I h h= = = =1 1 = 1 = 11 I ll = = 11= l = = =11= III 111=1 
IEGOTLILTIPEFTGNDSQA-SDYANFLESSFHKNITOJGVIvIlLRGNRGGDLSPMVLGLSPLLPDGTLFTYVDKSSHSKP 
55 130 140 150 160 170 180 190 



933 963 984 1014 1044 1074 1104 1134 

ITMKNIPLEALKISRKTINTKHV PIAIITNHKTASSAEMTFLSFKGLP^/KSFGQATAGYTTVNETFMLYDGARLAL 

: : = | : = 11=1= III- = = I II 1 = 1 I 111 = 1111 =1 =1111= 1 = 1 1111= I = 
VELQNGEINSGGSSTKVSDNKKIKKAPIAVLIDN1WGSSGELTALCFKGIPW/KFLGSDSAGYTSANQTVYLYDGSTLQI 
210 220 230 240 250 260 270 



1164 1194 1224 1254 1284 1314 1344 1374 

TTGIVSDRQGYKYENTPILPDQVTSLPLQESQSWLKSRINQN*GIINKGELYVIRNQSLRKSFSYTFFKRRDKGSTRRRF 
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SEQ ID 2122 (GBS38) was expressed in E.coli as a His-fiision product. SDS-PAGE analysis of total cell 
extract is shown in Figure 14 (lane 7; MW 37kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 16 (lane 12; MW 62kDa). 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 689 

A DNA sequence (GBSx0731) was identified in S.agalactiae <SEQ ID 2123> which encodes the amino 
acid sequence <SEQ ID 2124>. Analysis of this protein sequence reveals the following: 



- Final Results 

bacterial membrane — Certainty=0 . 0000 (Not Clear) > 

bacterial outside — Certainty=0 . 0000 (Not Clear) . 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) . 



20 The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2125> which encodes the amino acid 
sequence <SEQ ID 2126>. Analysis of this protein sequence reveals the following: 

d N- terminal signal sequence 



Pinal Results 

bacterial cytoplasm — Certainty=0 . 1369 (Affirmative) c suco 
bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 
bacterial outside — Certainty=0 . 0000 (Not Clear) < succ> 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 31/49 (63%) , Positives = 43/49 (87%) 

Query: 6 KKLTKSLGPIGKLISIIPDTTELIGKAIDNSRPIIEK3LDRRHEKKTDL 54 

K++ K+LG +GKL+SI+PDTTE+IGK IDNSRPIIEK ++++HEK+ L 
Sbjct: 3 KRIRKALGWGKLMSIVPDTTEI IGKTIDNSRP I IEKRMEQKHEKEMQL 51 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 690 

A DNA sequence (GBSx0732) was identified in S.agalactiae <SEQ ID 2127> which encodes the amino 
acid sequence <SEQ ID 2128>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

- Final Results 

bacterial cytoplasm --- Certainty=0. 3644 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0.0000 (Not Clear) < suco 
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The protein has no significant homology with any sequences in the GENPEPT database, but there is 
homology to SEQ ID 2126. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 691 

A DNA sequence (GBSx0733) was identified in S.agalactiae <SEQ ID 2129> which encodes the amino 
acid sequence <SEQ ID 2130>. This protein is predicted to be 28 kd outer membrane protein precursor 
(yaeC). Analysis of this protein sequence reveals the following: 

Possible site: 16 

>» May be a lipoprotein 

Final Results 

bacterial membrane Certainty=0 .0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB59827 GB:AJ012388 hypothetical protein [Lactococous lactis] 
Identities = 123/290 (42%) , Positives = 178/290 (60%) , Gaps = 18/290 (6%) 

MKI KKLLGLTTTWISAL ILGAC GQS XNEDAKWRVGTMVKSKTEK&R5TOKIEE 5 4 

+K +++L +T +++ +I+G G +K+V++G M K E W ++++ 

1TIIILVFII1VGGIFAFSHSGNKSKVSSKIVKIGLMPGGKQEDVIWKQVQK 61 

K-GVKLKFTEFTDYTQPNKALESDEIDINAFQHYNYLNM 113 
V G+ LKF FTD +PNKAL + E+D+NAFQHY YL +WNKAN N+VS+ +T 



Query: 




Sbjct: 


3 




55 


Sbjct: 


62 


Query: 


114 


Sbjct: 




Query: 


174 


Sbjct: 


176 




234 


Sbjct: 


236 



T+ D+ NPKSL +KE+DA+QT R+LDS AAVIN +F A + K +1+ EP -I 



A related DNA sequence was identified in S.pyogenes <SEQ ID 2131> which encodes the amino acid 
sequence <SEQ ID 2132>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1766 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0.0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 145/264 (54%) , Positives = 203/264 (75%) , Gaps = 2/264 (0%) 

Query: 20 LGACGQSKNEDAKVVRVGTtWKSKTSKARWDKIEELTO^ 79 

L AC + K +D + +G M K+++++ARWDK+EEL+KK + LK+ EFTDY+QPNKA+ 
Sbjct: 1 LVACSE-KQDDKNTLTIGVMTKTESDQARWDKVEELLKKDNITLKYKEFTDYSQPNKAVA 59 
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Query: 80 SDEIDINAFQHYOTI^NNWNKANKTNLVSVAETYFTSFRLYSGT-KNGKGKyQTVSEIPNK 138 

+ E+DINAFQHYN+LKNWNK NK +LV++A+TY + L+SGT ++GK KY++V+++PN 
Sbjct: 60 NGEVDimFQHWIM^KENKEHLVAIADWISPINLFSGTSQDGKAKYKSVADLPNG 119 



5 



Query: 139 ATITIPITOAVNESRSLYLLQSAGLLKLKVSGDALP.TMSDWSNPKSLDLKEVDAAQTARS 198 

I +PNDA NESR+LY+LQSAGL+KL VSGD LAT++++ N K LD+KE+DA+QTAR+ 
Sbjct: 120 TQIAVPNDATIffiSRALYVLQSAGLIKI^'SGDQIATIANISENKKKLDIKELDASQTARA 179 



10 



Query: 199 LDSTDAAVINHDFVTEAGINPKSAIFIEPKSKKAKQWYNLLVAQKGWQDKSKAKAIKEW 258 

L S DAA.V+NN + A 1+ K+4+F E N+KQW N++ QK W+ KA AIK+++ 
Sbjct: 180 IiVSADAAVVNNSYAVPAKIDYKTSLFICEI<ADDNSKQWINIIAGQI<DWEKSEKADAIKKLI 239 



15 




SEQ ID 2130 (GBS96) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 19 (lane 7; MW 32kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 22 (lane 3; MW 57.2kDa). 

20 The GBS96-GST fusion product was purified (Figure 195, lane 10) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 290), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

25 Example 692 

A DNA sequence (GBSx0734) was identified in S.agalactiae <SEQ ID 2133> which encodes the amino 
acid sequence <SEQ ID 2134>. Analysis of this protein sequence reveals the following: 
Possible site: 61 

>» Seems to have no N-terrainal signal sequence 



A related GBS nucleic acid sequence <SEQ ID 9807> which encodes amino acid sequence <SEQ ID 9808> 
was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
40 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 693 

A DNA sequence (GBSx0735) was identified in S.agalactiae <SEQ ID 2135> which encodes the amino 
acid sequence <SEQ ID 2136>. This protein is predicted to be glucose-inhibited division protein (gid). 
45 Analysis of this protein sequence reveals the following: 

Possible site: 18 

»> Seems to have no N-terminal signal sequence 



30 



Final Results 



bacterial cytoplasm Certainty=0 . 5103 (Affirmative) 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < 
bacterial outside --- Certainty=0. 0000 (Not Clear) < 




35 
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Final Results 

bacterial cytoplasm Certainty=0 . 0656 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside Certainty=0 . 0000 (Wot Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB13486 GB:Z99112 glucose-inhibited division protein [Bacillus subtilis] 
Identities = 289/439 (65%) , Positives = 352/439 (79%) , Gaps = 10/439 (2%) 

Query: 1 MSQSYINVIGAGLAGSEAAYQIAKRGIPVKLVEMRGVKSTPQHKTDNFAELVCSNSFRGD 60 

M+Q +NVIGAGLAGSEAA+Q+AKRGI VKLYEMR VK TP H TD FAELVCSNS R + 
Sbjct: 1 mQQTvOTIGAGIAGSFARWQIAKRGIQVTCLYEMRPVKQTPAHHTDKFAELVCSNSLRSN 60 

Query: 61 SLTNAVGLLKEEMRRLDSIIMRNGEAHRVPAGGAMAVDREGYSFAVTEEIHKHPLIEVIR 120 

+L NAVG+LKEEMR LDS 1+ + VPAGGA+AVDR ++ +VT + HP + VI 
Sbjct: 61 TLANAVGVLKFJSMRALDSAIIAAADECSVPAGGAIAVDRH^ 120 

Query: 121 DEITDIPGDAITVIATGPLTSDSLAAKIHEUaGGDGFYFYDAAAPIvDKOTIDINKVYLK 180 

+E+T+IP + T+IATGPLTS+SL+A++ Hi G D YFYDAAAPIV+K+++D++KVYLK 
Sbjct: 121 EEVTEIP-EGPTIIATGPLTSESLSAQLKELTGEDYLYFyDAAAPITOKDSLDMDKVYLK 179 

Query: 181 SRYDKGEAAYLNCPMTKEEFmFHEALTTAEEAPLNSFEKEKYFEGCMPIEVMAKRGIKT 240 

SRYDKGEAAYLNCPMT+EEF FHEALT+AE PL FEKE +FEGCMPIEVMAKRG KT 
Sbjct: 180 SRYDKGEAAYLNCPMTEEEFDRFHEALTSAETVPLKEFEKEIFFEGCMPIEVMAKRGKKT 239 

Query: 241 MLYGPMKPVGLEYPEDYKGPRDGEFKTPYAWQLRQDNAAGSLYNIVGFQTHLKWGEQKR 300 

ML+GPMKPVGLE+P K PYAWQLRQD+AAG+LYNIVGFQTHLKWG+QK 
Sbjct: 240 MLFGPMKPVGLEHPVTGK RPYAWQLRQDDAAGTLYNIVGFQTHLKWGDQKE 291 

Query: 301 VFQHIPGLENftEFvRYGVMHRNSY^SPNLLNQTFATRKNPNLFFAGQMTGVEGYvESAA 360 

V ++IPGLEN E VRYGVMHRN++++SP+LL T+ + +LFFAGQMTGVEGYVESAA 
Sbjct: 292 vLKLIPGLENVEIVRYGVMHRHTFINSPSLLKPTYQFKNRSDLFFAGQMTGVEGYVESAR. 351 

Query: 361 SGLVAGINAWRFNGESEWFPQTTAIGALPHYITHTDSKHFQPMStVNFGIIKELEGPRI 420 

SGLVAGINA + GE V+FPQ TAIG++ HYIT T+ K+FQPMN NFG++KEL 4 1 
Sbjct: 352 SGLVAGI^mAKLVLGEELVIFPQETAIGSMAHYITTIMQKNFQPMNANFGLLKELP-VKI 410 

Query: 421 RDKKERYEAIATRALKDLE 439 

++KKER E A RA++ ++ 
Sbjct: 411 KNKKERNEQYANRAIETIQ 429 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2137> which encodes the amino acid 
sequence <SEQ ID 2138>. Analysis of this protein sequence reveals the following: 
Possible site: 30 



Final Results 

bacterial membrane Certainty=0. 4376 (Affirmative) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

RGD motif: 111-113 

The protein has homology with the following sequences in the databases: 

>GP:CAB13486 GB:Z99112 glucose -inhibited division protein [Bacillus subtilis] 
Identities = 292/435 (67%) , Positives = 350/435 (80%) , Gaps = 10/435 (2%) 

Query: 59 IWIGAGIAGSEAAYQIAKRGIPVKLYEMRGVT<ATPQHKTTNFAELVCSNSFRGDSLTNA 118 

+NVIGAGLAGSEAA+Q+AKRGI VKLYEMR VK TP H T FAELVCSNS R ++L NA 
Sbjct: 6 VNVIGAGIAGSEAAWQIAKRGIQVKLYEMRPVKQTPRHHTDKFA3LVCSNSLRSNTLANA 65 

Query: 119 VGLLKEEMRRLDSIIMRNGEA!WVPA(3CR^VDREGYAESVTAELENHPLIEVIRGEITE 178 
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VG+LKEEMR LDS 1+ + VPAGGA+AVDR +A SVT ++NHP + VI E+TE 



Sbjct: 


66 


VGVLKEEMRALDSAIIAAADECSVPAGGAIAVDRHEFAASVTl^VKl^PNVTVINEEVTE 


125 


Query- 


179 


IPDDAITOIATGPLTSD7AlAEKIHALNGQDGF1fFYDAAAPIIDKSTIDMSKVYLKSRYDK 


238 






IP+ T+ IATGPLTS + +L+ ++ L G D YFYDAAAP1++K ++DM KVYLKSRYDK 




Sbjct: 


126 


IPEGP-TIIATGPLTSESLSAQLKELTGEDYLYFYDAAAPIVEKDSLDMDKVYLKSRYDK 


184 


Query 


239 


GEAAYLNCPMTKEEFMAFHEALTTAEEAFLN^FEKEKYFEG^MPIEVMAKRGIKTMLYGP 


298 






GEAAYLNCPMT+EEF FHEALT+AE PL FEKE +FEGCMPIEVMAKRG KTML+GP 




Sbjct: 


185 


GEAAYLNCPMTEEEFDRFHEALTSAETi/PLKEFEXEIFFEGCMPIEVMAKRGKKTMLFGP 


244 


Query 
uery: 


299 


MKPVGLEYPDDYTGPRDGEFKTPYAWQLRQDNAAGSLYMIVGFQTHLKWGEQKRVFQMI 


358 






MKPVGLE+P TG R PYAWQLRQD+AAG+LYNIVGFQTHLKWG+QK V ++I 




Sbjct: 


245 


MKPVGLEHP - -VTGKR PYAWQLRQDDAAGTLYNIVGFQTHLKWGDQKEVLKLI 


296 


uery. 












PGLEN E VRYGVMHRN++++SP+LL T+Q ++ +LFFAGQMTGVEGYVESAASGLVA 




Sbjct: 


297 


PGLENVEIVRYGVMHRNTF1NSPSLLKPTYQFKNRSDLFFAGQMTGVEGYVESAASGLVA 






419 


GINAARLFKREEALI FPQTTAIGSLPHYVTHADSKHFQPMIWNFGI I KELEGPRIRDKKE 


478 






GINAA+L EE +IFPQ TAIGS4- HY+T 4 K+FQPMN NFG++KEL +I++KKE 




Sbjct: 


357 


GINAAKLVLGEELVIFPQETAIGSMAHYITTTNQKNFQPMNANFGLLKELP-VK1KNKKE 


415 




479 


RYEAIASRALADLDT 493 








R E A+RA+ + T 




Sbjct: 


416 


RNEQYANRAIETIQT 430 





An alignment of the GAS and GBS proteins is shown below: 
Identities = 395/439 (89%) , Positives = 417/439 (94%) 







SYINVIGAGLAGSEAAYQIAKRGIPVKLYEMRGVKSTPQHKTDNFAELVCSNSFRGDSLT 


63 






+YINVIGAGLAGSEAAYQIAKRGIPVKLYEMRGVK+TPQHKT NFAELVCSNSFRGDSLT 




Sbjct: 


57 


TYINVIGAGLAGSEAAYQIAKRGIPVKLYEMRGVKATPQHKTTNFAELVCSNSFRGDSLT 


116 




64 


NAVGLLKEE^RLDSIIMRNGEAHRVPAGGAMAVDREGYSEAVTEEIHKHPLIEVIRDEI 


123 






NAVGIiLKEEMRRLDS I IMRNGEA+RVPAGGAMAVDREGY+E+VT E+ HPLIEVIR EI 




Sbjct: 




NAVGLLKEEMRRLDSIIMRNGFAITOVPAGGAMAVDREGYAESVTAELENHPLIEVIRGEI 


176 


Query: 


124 


TDIPGDAITVIATGPLTSDSLAAKIHELNGGDGFYFYDAAAPIVDKNTIDINKVYLKSRY 


183 






T+IP DAITVIATGPLTSD+LA KIH LNGGDGFYFYDAAAPI+DK+TID++KVYLKSRY 




Sbjct: 


177 


TEIPDDAITVIATGPLTSDALAEKIHALNGGDGFjYFYDAAAPIIDKSTIDMSKVYLKSRY 


236 




184 


DKGFAAYIJ^CPMTKEEFMAFHEALTTAEEAPLNSFEKEKYFEGCMPIEVMAKRGIKTMLY 


243 






DKGEAAYLNCPMTKEEFI^FHEALTTAEEAPLN+FEKEKYFEGCMPIEVMAKRGIKTMLY 




Sbjct: 


237 


DKGEAAYtNCPMTKEEFMAFHEALTTAEEAPLNAFEKEKYFEGCMPIEVMAKRGIKIMLY 


296 




244 


GPMKPVGLEYPEDYKGPRDGEFKTPYAWQLRQDNAAGSLYNIVGFQTHLKWGEQKRVFQ 


303 






GPMKPVGLEYP+DY GPRDGEFKTPYAWQLRQDNAAGSLTNIVGFQTHLKWGEQKRVFQ 




Sbjct: 


297 


GPMKPVGLEYPDDYTGPRDGEFKTPYAWQLRQDNAAGSLYNIVGFQTHLKWGEQKRVFQ 


356 




304 


MIPGLENAEFVRYGVMHRNSYMDSPNLLNQTFATRKNPNLFFAGQMTGVEGYVESAASGL 


363 






MIPGLENAEFVRYGVMHRNSYMDSPNLL +TF +R NPNLFFAGQMTGVEGYVESAASGL 




Sbjct: 


357 


MIPGLENAEFVRYGVMHRNSYI>1DSPNLLTETFQSRSNPNLFFAGQMTGVEGYVESAASGL 


416 




364 


VAGINAVRRFNGESEWFPQTTAIGALPHYITHTDSKHFQPMNVNFGI I KELEGPRIRDK 


423 






VAGINA R F E ++FPQTTAIG+LPHY+TH DSKHFQPMNVNFGIIKELEGPRIRDK 




Sbjct: 


417 


VAGINAARLFKREEALIFPQTTAIGSLPHYVTHADSKHFQPMNVNFGIIKELEGPRIRDK 


476 


Query: 


424 


KERYEAIATRALKDLEKFL 442 








KERYEAIA+RAL DL+ L 




Sbjct: 


477 


KERYEAIASRALADLDTCL 495 





Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 694 

A DNA sequence (GBSx0736) was identified in S.agalactiae <SEQ ID 2139> which encodes the amino 
acid sequence <SEQ ID 2140>. This protein is predicted to be transcriptional regulator (GntRfamily). 
Analysis of this protein sequence reveals the following: 

5 Possible site: 13 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 5103 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB04138 GB:AP001508 transcriptional regulator (GntR family) 
[Bacillus halodurans] 
Identities = 83/229 (36%) , Positives = 133/229 (57%) , Gaps = 1/229 (0%) 

Query: 2 LPAYIKIHDAIKKEIDKGTWKIGQRLPSERDLADDYSVSRMTLRQSITLLVEEGILERRV 51 

LP Y +1 + IK++I+ G K G L SER+ A+ Y VSRMT+RQ+I LV +G + ++ 
Sbjct: 8 LPIYYQIEEQIKQQIESGVLKPGDMLKSEREYAEYYDVSRMTVRQAINKILVNQGYIYKKK 67 

Query: 62 GSGTWASHRVQEKMRGTTSFTEIVNSQGRKPSSKLISFQRKLANETEIQKLNLSQSDYV 121 

GSGTYV ++++ + G TSFTE + +G +PSS+L+ F+ A ++LNL ++ V 

Sbjct: 68 GSGTYVQE KKI EQALNGLTS FTEDMRKRGME PSSRLLKFELI PATAKIAKELNLKENTPV 127 

Query: 122 VRMERVRYADKVPLVYEVASIPENLIKGFEQSEVTEHFFKTLTEN-GYEIGKSQQT1YAR 180 

++R+RY D VP+ E +P NL+KG + ++++ + E I+QIA 
Sbjct: 128 TEIKRIRYGDGVPIAIERNLIjPANLVKGLNEEIINQSLYQYIEEELNLRIADALQVlEAS 187 

Query: 181 NASERVASHLETOAGHAILALTQVSYFTDGKPFEYVHGQYVGDRFEFYL 229 

AS+ A LE+ G IL + + ++ DG E V Y DR++F + 
Sbjct: 188 TASKTEADLLEIQKGSPILIiIERKTFLADGTVLELVKSAYRADRYKFMI 236 

There is also homology to SEQ ID 1256. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 695 

A DNA sequence (GBSx0737) was identified in S.agalactiae <SEQ ID 2141> which encodes the amino 
acid sequence <SEQ ID 2142>. This protein is predicted to be GMP synthase (guaA). Analysis of this 
protein sequence reveals the following: 

Possible site: 46 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.96 Transmembrane 228 - 244 ( 228 - 245) 

Final Results 

bacterial membrane Certainty=0 .1383 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAD15805 GB:AF058326 GMP synthase [Lactococcus lactis] 
Identities = 416/511 (81%) , Positives = 467/511 (90%) , Gaps = 3/511 (0%) 

Query: 10 IQKIIVLDYGSQYNQLIARRIREFGVFSELKSHKITADEIRDINPIGIVLSGGPNSVYAD 69 

++KIIVLDYGSQYNQLIARRIRE GVFSEL SHK+TA EIR+INPIGI+LSGGPNSVY + 
Sbjct: 6 LEKIIVLDYGSQYNQLIARRIREIGVFSELMSHKVTAKEIREINPIGIILSGGPNSVYDE 65 
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Query: 


70 


Sbjct: 


66 


Query: 


130 


Sbjct: 


123 


Query: 


190 


Sbjct: 


183 


Query: 


2S0 


Sbjct: 


243 


Query: 


310 


Sbjct: 


303 


Query: 


370 


Sbjct : 


363 


Query: 


430 


Sbjct: 


423 




490 


Sbjct: 


483 



G+F ID EIFELG+P+LGI CYGMQL+ + +KLGG V AGE REYG + 



GTP+ Q VLMSHGD VT IPEGFH+VG S + PFAA+ENTE+ YGIQFHPEVRHSV+G 



++L+NFA+NICGA+G+WSM+NFIDM+I IRE VGD+KVLLGLSGGVDSSWGVLLQRAI 



GDQLT IFVDHG LRK E DQVM+ LGGKFGLNII+VEA KRF+D L G+ DPE +RKII 



PLNTLFKDEVRALGT LGMPDE+VWRQPFPGPGLAIRV+G++TEEKLETVRESDAILREE 



IA +GL+RDVWQYFTVNT V+SVGVMGD RTYDYT+AIRAITSIDGMTADFAQLPWD+L+ 



KIS RIVNEVDHVNRIVYDITSKPPATVEW+ 
KISKRIVNEVDHVNRIVYDITSKPPATVEWQ 513 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2143> which encodes the' amino acid 
sequence <SEQ ID 2144>. Analysis of this protein sequence reveals the following: 

Possible site: 46 
»> Seems to have no N-terminal signal sequence 

Likelihood = -0.96 Transmembrane 228 - 244 ( 228 - 245) 



Final Results 

bacterial membrane --- Certainty=0 . 1383 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Mot Clear) < suco 

RGD motif: 203-205 



The protein has homology with the following sequences in the databases: 

>GP:AAD15805 GB:AF058326 GMP synthase [Lactococcus lactis] 
Identities = 411/511 (80%) , Positives = 464/511 (90%) , Gaps = 3/511 (0%) 

50 







10 


VQKIIVLDYGSQYNQLIARRIREFGVFSELKSHKITAQELREINPIGIVLSGGPNSVYAD 


69 








++KI IVLDYGSQYNQLIARRIRE GVFSEL SHK+TA+E+REINPIGI+LSGGPNSVY + 






Sbjct: 


5 


LEKIIVLDYGSQYNQLIARRIREIGVFSELMSHKVTAKEIREINPIGIILSGGPNSVYDE 


65 


55 




70 


NAFGIDPEIFELGIPILGICYGMQLITHKLGGKWPAGQAGNREYGQSTLHLRETSKLFS 


129 








+F 1DPE1FELG+P+LGICYGMQL+++KLGG V AG+ REYG + L L E S LF+ 






Sbjct: 


66 


GS FD IDPE IFELGLP VLGI CYGMQLMS YKLGGMVEAAGE - - -REYGVAPLQLTEKSALFA 


122 






130 


GTPQEQLVLMSHGDAVTEIPEGFHLVGDSNDCPYAAIENTEKNLYGIQFHPEVRHSVYGN 


189 


60 






GTP+ Q VLMSHGD VT IPEGFH+VG S + P+AA+ENTE+NLYGIQFHPEVRHSV+G 






Sbjct: 


123 


GTPEVQDVLMSHGDRVTAIPEGFHWGTSPNSPFAAVENTERNLYGIQFHPEVRHSVHGT 


182 






190 


DILKNFAISICGiARGDWSMDNFIDlffilAKIRETVGDRKVIiGLSGGVDSSWGVLDQKA 


249 








++L+NFA++ICGA+G+WSM+NFIDM+I IRE VGD+KVLLGLSGGVDSSWGVLLQ+AI 




65 


Sbjct: 


183 


EMLRlCFAIiNICGAKGNWSI«lENFIDMQIKDIREIWGDKKVLLGLSGGVDSSWGVLLQRAI 


242 
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Query 
Sbjct 

Sbjct 
Query: 
Sbjct 



250 GDQLTCIFVDHGLLRKDEGDQTOGMLGGKFGLNIIRVDASKRFLDLLADVEDPEKKRKII 309 

GDQLT IFVDHG LRK E DQVM LGGKFGLNII+VDA KRF+D L + DPE +RKII 
243 GDQLTSIFVDHGFLRKGEADQVMETLGGKFGLNIIKVDAQKRFMDKLVGLSDPETQRKII 302 

310 GNEFVYVFDDEASKLKGVDFLAQGTLYTDIIESGTETAQTIKSHHNVGGLPEDMQFELIE 369 

GNEFVYVFDDEA+KL+GVDFLAQGTLYTD+IESGT+TAQTIKSHHNVGGLPEDMQF+LIE 
303 GNEFVYVFDDEANKLEGVDFLAQGTLYTDVI E SGTDTAQT I KSHHNVGGLPEDMQFQL IE 362 

370 P^LFIODEVRALGIALGMPEEIVWRQPFPGPGIAIRVMGAITEEKliETWESDAILREE 429 

PLNTLFKDEVRALG LGMP+EIVWRQPFPGPGLAIRV+G +TEEKLETVRESDAILREE 
363 PIOTLFKDEVRALGTQLGMPDEIVWRQPFPGPGIAIRVLGDLTEEKLETVRESDAILREE 422 

430 IAKAGLDRDWQYFTVNTGVRSVGVMGDGRTYDYTIAIRAITSIDGMTADFAQLPWDVLK 489 

IA +GL+RDVWQYFTVNT V+SVGVMGD RTYDYT+AIRAITSIDGMTADFAQLPWD+L+ 
423 IAASGLERDVWQYFTVNTDVKSVGVMGDQRTYDYTLAIRAITSIDGMTADFAQLPWDLLQ 482 

Sbjct 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 487/520 (93%) , Positives = 505/520 (96%) 

Query: 1 MTDISILNDIQKI IVLDYGSQYNQLIARRIREFGVFSELKSHKITADEIRDINPIGIVLS 60 

MT+ISILND+QKIIVLDYGSQYWQLIARRIREFGVFSELKSHKITA E+R+INPIGIVLS 
Sbjct: 1 MTEISILNDVQKIIVLDYGSQYNQLIARRIREFGVFSELKSHKITAQELREINPIGIVLS 60 

Query: 61 GGPNSVYADGAFGIDEEIFELGIPILGICYGMQLITHKLGGKVLPAGEAGHREYGQSALR 120 

GGPNSVYAD AFGID EIFELGIPILGICYGMQLITHKLGGKV+PAG+AG+REYGQS L 
Sbjct: 61 GGPNSVYADNAFGIDPEIFELGIPILGICYGMQLITHKLGGKWPAGQAGIvlREYGQSTIiH 120 

Query: 121 LRSESALFAGTPQEQLVLMSHGDAVTEIPEGFHLVGDSVDCPFAAMENTEKQFYGIQFHP 180 

LR S LF+GTPQEQLVLMSHGDAVTEIPEGFHLVGDS DCP+AA+ENTEK YGIQFHP 
Sbjct: 121 LRETSKLFSGTPQEQLVLMSHGDAVTEIPEGFHLVGDSNDCPYAAIENTEKNLYGIQFHP 180 

Query: 181 EVRHSVYGITOILKNFAWia^GDWSMDNFIDMEIAKIRETVGDRKVLLGIiSGGVDSSV 240 

EVRHSVYGNDILKNFA++ICGARGDWSHDNFIDMEIAKIRETVGDRKVLLGLSGGVDSSV 
Sbjct: 181 EWHSVYGNDILKNFAISICGARGDWSTONFIDKEIAKIRETVGDRKVLLGLSGGVDSSV 240 

Query: 241 VGVLLQRAIGDQLTCI FVDHGLLRKNEGDQVMDKLGGKFGLNI IRVDASKRFLDLLSGVE 300 

VGVLLQ+AIGDQLTCIFVDHGLLRK+EGDQvM MLGGKFGLNI IRVDASKRFLDLL+ VE 
Sbjct: 241 VGVLLQKAIGDQLTC I FVDHGLLRKDEGDQVMGMLGGKFGLNI I RVDASKRFLDLLADVE 300 

Query: 301 DPERKRKIIGNEFVYVFDDEASKLKGVDFIAQGTLYTDirESGTETAQTIKSHHNVGGLP 360 

DPE+KRKIIGNEFVYVFDDEASKLKGVDFLflQGTLYTDIIESGTETAQTIKSHHNVGGLP 
Sbjct: 301 DPEKKRKI IGNEFVYVFDDEASKLKGVDFLAQGTLYTD I IESGTETAQTI KSHHNVGGLP 3 60 

Query: 361 EDMQFEL I EPLNTLFKDEVRALGTALGMPDE VWRQPFPG PGLAIR VMGE ITEEKLETVR 420 

EDMQFELIEPLNTLFKDEVRALG ALGMP+E+VWRQPFPGPGLAIRVMG ITEEKLETVR 
Sbjct: 361 EDMQFELIEPIJSTTLFKIDEVRALGIALGMPEEIWRQPFPGPGI^IRvMGAITEEKLETVR 420 

Query: 421 ESDAILREEIAKAGLDRDWQYFTVNTGVRSVGVMGDGRTYDYTIAIRAITSIDGMTADF 480 

ESDAILREEIAKAGLDRDVWQYFTVNTGTOSVGVMGDGRTYDYTIAIRAITSIDGMTADF 
Sbjct: 421 ESDAILREEIAKAGLDRDVWQYFTVNTGVRSVGVMGDGRTYDYTIAIRAITSIDGMTADF 480 

Query: 481 AQLPWDVLKKISTRIVNEVDHVNRIVYDITSKPPATVEWE 520 

AQLPWD VLKKI STRI VNEVDHVNRI VYDITSKPPATVEWE 
Sbjct: 481 AQLPWD VLKKI STRIVE vDHVKRIVYDITSKPPATVEWE 520 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 696 

A DNA sequence (GBSx0740) was identified in S.agalactiae <SEQ ID 2145> which encodes the amino 
acid sequence <SEQ ID 2146>. This protein is predicted to be branched chain amino acid ABC transporter, 
periplasmic amino acid-bind. Analysis of this protein sequence reveals the following: 

5 Possible site: 58 

>» Seems to have no N-terminal Gignal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 0957 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9409> which encodes amino acid sequence <SEQ ID 941 0> 
was also identified. 

15 The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAD36211 GB:AE001771 branched chain amino acid ABC transporter, 

periplasmic amino acid-binding protein [Thermotoga maritima] 
Identities - 31/92 (33%) , Positives = 51/92 (54%) , Gaps = 4/92 (4%) 

20 Query: 26 AKAFHDHYVTCAYGEEPSMFSALSYnAVYMAAKSAKGAKTSID IKKALAKLKDFKGVT B2 

AK F + Y 4 YG+EP+ +AL YDA YM A SD I + + K++FG + 

Sbjct: 275 AKI^FVEWICEKYGKEPAAI^ALGYDA-YMVLLDAIERAGSFDREKIAEEIRKTENFNGAS 333 

Query: 83 GKMSIDKNHNWKBAYWKLEDGKTSSVNIIS 114 
25 G ++ID+N + +KS V +++G +1 + 

Sbjct: 334 GIINIDENGDAIKSVWNIVKNGSVDFEAVIN 365 

No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 9410 (GBS660) was expressed in E.coli as a GST-fusion product. SDS-PAGE analysis of total cell 
30 extract is shown in Figure 135 (lane 8 & 9; MW 71.5kDa) + lane 10; MW 27kDa). It was also expressed in 
E.coli as a His-fusion product. SDS-PAGE analysis of total cell extract is shown in Figure 141 (lane 2; MW 
46.5kDa) and in Figure 181 (lane 3; MW 46kDa). 

GBS660-His was purified as shown in Figure 233, lane 5-6. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
35 vaccines or diagnostics. 

Example 697 

A DNA sequence (GBSx0741) was identified in S.agalactiae <SEQ ID 2147> which encodes the amino 
acid sequence <SEQ ID 2148>. Analysis of this protein sequence reveals the following: 



Possible site: 27 



40 


>» Seems to have a cleavable N-term signal seq. 












INTEGRAL Likelihood =-10.61 Transmembrane 


140 


156 


129 


158) 




INTEGRAL Likelihood = -9.55 Transmembrane 


60 


76 


53 


80) 




INTEGRAL Likelihood = -7.59 Transmembrane 


264 


280 


257 


285) 




INTEGRAL Likelihood = -5.79 Transmembrane 


232 


248 


219 


251) 


45 


INTEGRAL Likelihood = -2 . 23 Transmembrane 


190 


206 


190 


207) 




INTEGRAL Likelihood = -1.75 Transmembrane 


90 


106 


90 


110) 



Final Results 

bacterial membrane --- Certainty=0. 5246 (Affirmative) < succ: 

50 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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A related GBS nucleic acid sequence <SEQ ID 10059> which encodes amino acid sequence <SEQ ID 
1006O was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 



>GP:AAD36212 C 



3 chain amino acid ABC transporter, 



permease protein [Thermotoga maritima] 
Identities = 140/295 (47%) , Positives = 200/295 (67%) , Gaps = 7/295 (2%) 



LQQLWGLILGSIYALIiALGYTMVYGIIKLINFAHGDIYMMGAFMGYYLINHLHLNFFIiA 51 
LQ L NG++LG +YAL+A+GYTMVYGI++LINFAHGD+ MMG + +Y L LN + 
LQI^FNGIMLGGLYALIAIGTOIWGILRLINFAKGDVMMMGVYFAFYAATLLSLNPLFS 54 





2 


Sbj ct : 


5 




62 


Sb j ct : 


65 




122 


Sbjct: 


125 






Sbjct: 


185 




237 


Sbjct: 


245 



rM/GADTRAFPQA 121 



LG +1+ 4-AY+PLR + RI-fALITAIGVSF LE 



GA +GG 4-+G++E 



- ILI+I L++P+G+LGK I EKV 



There is also homology to SEQ ID 2150. A related sequence was also identified in GAS <SEQ ID 9171 > 
30 which encodes the amino acid sequence <SEQ ID 9172>. Analysis of this protein sequence reveals the 
following: 



Possible site: 30 
Seems to have an uncleavable N-term signal seq 
5GRAL Likelihood =-12.74 Transmembrane 196 



Likelihood =-12 
Likelihood = -7 
Likelihood = -4 
Likelihood = -2 
Likelihood = -2 
Likelihood = -1 
Likelihood = -1 



• Final Results -■ 

bacterial n 
bacterial outside - 
bacterial cytoplasm - 



Transmembrane 
Transmembrane 
Transmembrane 
Tr an smembrane 
Transmembrane 



212 ( 191 - 219) 



106 - 122 



317 - 333 



• Certainty=0. 609 (Affirmative) ■ 

• Certainty=0. 0000 (Not Clear) < 

• Certainty=0. 0000 (Wot Clear) < 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 35/147 (23%) , Positives = 71/147 (47%) , Gaps = 6/147 (4%) 

Query: 134 ITOTQLIILGI--ALLLMLTLQFIVQKTKI*lGKAI'lRALS\ r DSDAAQLMGINvWRTISFTFA 191 

+TN I +GI A++ + + F+4- KT +G +R++ ++ A++ G++ RTI + 
Sbjct: 197 LTHNSRINIGIFFAIIAIALIWFLLNKTTLGFEIRSVGLNPHASEYAGMSSKRTIILSMI 256 

Query: 192 LGSALAGAGGVL - - IGLYYNSVQPLMGVTPGLKAFVAA'VIiGGIGI I PGAAIGGFVTGI LE 249 

+ ALAG GGV+ +G + N + G ++L + G F+ G+L 

Sbjct: 257 ISGAIAGLGGVVEGILGTFEWFVC^SSIAVGFDGffiVSLIiAANSPL-GIFFSSFLFGVLN 315 

Query: 250 TLATALGVSDFRDGIVYAILI-LIFLI 275 

A + ++ +V + +IF + 

Sbjct: 316 IGAPGMWIAGIPPELVKWTASIIFFV 342 



WO 02/34771 



-798- 



PCT/GB01/04789 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 698 

A DNA sequence (GBSx0742) was identified in S.agalactiae <SEQ ID 2151> which encodes the amino 
acid sequence <SEQ ID 2152>. This protein is predicted to be branched chain amino acid ABC transporter, 
permease protein (livM). Analysis of this protein sequence reveals the following: 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



l uncleavable N-term signal seq 



Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 



Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 
Transmembrane 



- Final Results 

bacterial membrane - 
bacterial outside - 



- 292 ( 273 - 

- 175 ( 154 - 

- 252 { 232 - 

- 58 ( 38 - 

- 13S ( 119 - 

- 271 ( 253 - 

- 82 ( 65 - 



- Certainty=0 .4503 (Affirmative 

- Certainty=0. 0000 (Not Clear) ■ 



bacterial cytoplasm Certainty=0. 0000 (Not Clear) < succs 

The protein has homology with the following sequences in the GENPEPT database: 



>GP:AAD36213 G 



1 chain amino acid ABC transporter. 



i = 33/332 (9%) 







Sbjct: 


16 




72 




76 




115 


Sb j ct : 


136 




175 


Sbj ct: 


196 




235 


Sbjct: 


256 




279 


Sbjct: 





L GDYLAIA+LG AE+IRI+ +N I TNG G+ GIP ++ 



) IAAE+MG++ K +++ FV GA A ++GSL A + 



M++ VLI++VLGGLGS++G++ 



No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be v. 
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Example 699 

A DNA sequence (GBSx0743) was identified in S.agalactiae <SEQ ID 2153> which encodes the amino 
acid sequence <SEQ ID 2154>. This protein is predicted to be branched chain amino acid ABC transporter, 
ATP-binding protein (livG). Analysis of this protein sequence reveals the following: 

3 M-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2057 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAD36214 GB:AE001771 branched chain amino acid ABC transporter, 
ATP-binding protein [Thermotoga maritima] 
Identities = 136/271 (50%) , Positives = 189/271 (69%) , Gaps = 21/271 (7%) 



*• G +P +1 LG+ RTFQNIRLF +MTVL+NVLV +H LS+P A 



Query: 


3 


Sbjct: 


11 




63 


Sbjct: 


71 




121 


Sbjct: 


130 


Query: 


163 


Sbjct: 


190 


Query: 


223 


Sbjct: 


250 



RALATEPK++ LDEPAAGMNP+ET +L + I QI+ DF++T++LIEHDM +VM + ERI 
MU^LATEPKLILLDEPAAGMNPKETEDLMEFIKQIRKDFNLTVLLIEHDMKVVMGICERI 249 

YVLEYGRLIAHGTPEE I KNNKRVIEAYLGGE 253 

V++YGR+IA GTP+EI+N+ RVIEAYLG E 
IVMDYGRI IAEGTPKEIQNDPRVIEAYLGRE 280 

There is also homology to SEQ ID 644. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 700 

A DNA sequence (GBSx0744) was identified in S.agalactiae <SEQ ID 2155> which encodes the amino 
acid sequence <SEQ ID 2156>. Analysis of this protein sequence reveals the following: 

j N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2216 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB52068 GB:AL109732 putative branched chain amino acid 

transport ATP-binding protein [Streptomyces coelicolor 
A3 (2)] 

Identities = 136/233 (58%) , Positives = 181/233 (77%) 
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MLKVENLSIHYGVIQAVMDVSFEWQGEVA'TLIGAIIGAGKTSILRTISGLVRPSQGSISF 62 
+L+VE+L + YG I+AV +SF+V+ GEWTLIG KGAGKT+ LRT+SGL++P GIF 
LLEVEDLRVAYGKIEAVKGISFKVDAGEWTLIGTNGAGKTTTLRTLSGLLKPVGGQIRF 63 



GK + K+ A +IV GLA PEGRH+F +++ +NL +GAFL+ DR +K +++ +D 



FP L ER+ Q A TLSGGEQQMLAMGRALMS+PKLL+LDEPSMGL+PI +Q+I 



Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 



Query: 


3 


Sbjct: 


4 


Query: 


63 


Sbjct: 


64 


Query: 


123 


Sbjct: 


124 




183 


Sbjct: 


184 



20 Example 701 

A DNA sequence (GBSx0745) was identified in S.agalactiae <SEQ ID 2159> which encodes the amino 
acid sequence <SEQ ID 2160>. Analysis of this protein sequence reveals the following: 

Possible site: 23 

>» Seems to have no N-terminal signal sequence 

25 

Final Results 

bacterial cytoplasm — Certainty=0. 0415 (Affirmative) < suco 

bacterial membrane Certainty-0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

30 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAD36216 GB:AE001771 conserved hypothetical protein [Thermotoga maritima] 
Identities = 72/166 (43%), Positives = 116/166 (69%), Gaps = 2/166 (l%) 

35 Query: 1 MPVKDFMTKKLVWSPDTWAE^UfflLLREHHLRRLPVVEMJQLVGLVTEGTMAEAQPSKA 60 

M VKDFMT+ + ++P+T+ +EA L++++ ++RL V++N+++VG+VTE + A PSKA 
Sbjct: 1 MLVKIJFMTRNPITIAPETSFSEALKLMKQNKIKRLIVMKNEKIVGIVTEKDLLYASPSKA 60 

Query: 61 TSLSIYEMNYLLNKTKIRDIMIKDIVTVSQYASLEDAIYLMMSRKIGVLPWDN-GQLYG 119 
40 T+L+I+E++YLL+K KI +IM KD+VTV++ +EDA +M + I LPWD+ G+L G 

Sbjct: 61 TTLNIWELHYLLSKLKIEEIMTKDVVTVNEKTPIEDAARIMEEKDISGLPVA7DDAGRLVG 120 

Query: 120 IOTDRDVFKAFLEIAGYGQE-SYRLVILADEGIGVLSKVLNRLSSA 164 
I+T D+FK F+EI G +E + R + + G L +V R+ A 
45 Sbjct: 121 I ITQTDI FICVFvEI FGTKREGTIRYTMEMPDKPGEZjLEVAKRIYEA 166 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 702 

50 A DNA sequence (GBSx0746) was identified in S.agalactiae <SEQ ID 2163> which encodes the amino 
acid sequence <SEQ ID 2164>. Analysis of this protein sequence reveals the following: 

Possible site: 41 

>» Seems to have no N-terminal signal sequence 

55 Final Results 

bacterial cytoplasm Certainty=0. 5585 (Affirmative) < suco 

bacterial membrane Certainty=0,0000 (Not Clear) < suco 
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bacterial outside — - Certainty=0. 0000 {Not Clear) < sues? 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

5 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 703 

A DNA sequence (GBSx0747) was identified in S.agalactiae <SEQ ID 2165> which encodes the amino 
acid sequence <SEQ ID 2166>. This protein is predicted to be a transposase. Analysis of this protein 
1 0 sequence reveals the following: 

Possible site: 3 8 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.65 Transmembrane 53 - 69 ( 53 - 70) 

15 Final Results 

bacterial membrane --- Certainty=0. 16 5 9 (Affirmative) < bum? 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

20 The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAA85003 GB:TJ28972 SpVl 0RF3; putative transposase [Spiroplasma citri] 
Identities - 49/154 (31%), Positives = 80/154 (51%), Gaps = 11/154 (7%) 

Query: 39 WLEIfflWIGRIGGKVLLTENVAFCMFIFAKLMDSKTAIETAKHIQ — VIKRTLYDNKRDF 96 
25 WLEMDTV+G+ +L FA +++ TA E K + +IK L + 

Sbjct: 174 WLEMDTWGKDHKSAILVLVEQLSKKYFAIKLENHTAI^VEKKFKDIIIKNNLIGKIKG- 232 

Query: 97 FELFPVILTDNGGEFARVDDIEIDVCGQSQ^FFCDPNRSDQKARIEKNHTLVRDILPKGT 156 
I+TD G EF++ ++EI ++Q++FCD QK IE ++ +R PKGT 

30 Sbjct: 233 IITDRGKEFSKWREMEI--FAETQVYFCDAGSPQQKPLIEYMNSELRHWFPKGT 284 

Query: 157 SFDNLTQEDINLALSHINSVKRQALNGKTAYELF 190 

F+ ++Q+ 1+ ++ IN R LN ++ E+F 
Sbjct: 285 DFNKVSQKQIDWWNVINDKLRPCLNWISSKEMF 318 

35 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 704 

40 A DNA sequence (GBSx0748) was identified in S.agalactiae <SEQ ID 2167> which encodes the amino 
acid sequence <SEQ ID 2168>. Analysis of this protein sequence reveals the following: 



- Final Results 

bacterial cytoplasm --- Certainty=0. 3 116 (Affirmative) < suco 

bacterial membrane --- Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 



50 A related GBS nucleic acid sequence <SEQ ID 10055> which encodes amino acid sequence <SEQ ID 
10056> was also identified. 
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The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 70S 

A DNA sequence (GBSx0749) was identified in S.agalactiae <SEQ ID 2169> which encodes the amino 
acid sequence <SEQ ID 2170>. This protein is predicted to be thymidylate kinase (tmk). Analysis of this 
protein sequence reveals the following: 

d N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1876 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10053> which encodes amino acid sequence <SEQ ID 
10054> was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB03761 GB:AP001507 thymidylate kinase [Bacillus halodurans] 
Identities = 112/210 (53%) , Positives = 148/210 (70%) , Gaps = 1/210 (0%) 

Query: 17 MKKGLMISFEGPDGAGKTTVLEAVLPLLREKLSQDILTTREPGGVTISEEIEHIILDVKH 75 

M KG 1+ EG +GAGKT+ L+A+ +LRE ++ TREPGG+ I+E+IR IILDV H 

Sbjct: 1 MTKGCFITVEGGEGAGKTSALDMEEMLREN-GLSWRTREPGGIPIAEQ1RSIILDVDH 59 

Query: 77 TQr^KKTELLLYMAARRQHLVEKVLPALEEGKIVLMDRFIDSSVAYCjGSGRGLDKSHIKW 136 

T+MD +TE LLY AARRQHLVEKVLPALE G +VL DRFIDSS+AYQG RG+ I 
Sbjct: 60 TRMDPRTFJ^LYAAARRQHLVEKVLPALEAGHVVLCDRFIDSSLAYQGYARGIGFEDIIA 119 

Query: 137 LNDYATDSHKPDLTLYFDVPSEVGLERIQKSVQREVNRLDLEQLDMHQRVRQGYLELADS 196 

+N++A + PDLTL F V +VGL RI + RE NRLD E L HQ+V++GY + ++ 
Sbjct: 120 INEFAIEGRYPDLTLDFRVDPDVGLSRIHRDQSREQNRLDQEALTFHQKVKEGYERIVET 179 

Query: 197 EPNRIVTIDASQQLDEVIAETFSIILDRIN 226 

P R+V IDA+Q D+V+A+ +1 R++ 
Sbjct: 180 YPERWEIDANQSFDQWADAVRMIKQRLS 209 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2171> which encodes the amino acid 
sequence <SEQ ID 2172>. Analysis of this protein sequence reveals the following: 

Possible site: 56 
»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.75 Transmembrane 215 - 231 ( 215 - 231) 

Final Results 

bacterial membrane Certainty=0 . 1298 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:BAB03761 GB:AP001507 thymidylate kinase [Bacillus halodurans] 
Identities = 109/205 (53%), Positives = 148/205 (72%), Gaps = 1/205 (0%) 

Query: 22 MITGKLITVEGPDGAGKTTVLEQLIPLLKQKVAQDILTTREPGGVAISEHIRELILDINH 81 
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Query: 82 TAMDPKTELLLYIAARRQHLVEK\CjFALEAGQLVFIDRFIDSSVAYQGAGRGLIKRDIQW 141 

T MDP+TE LLY AARRQHLVEKVLPALEAG +V DRFIDSS+AYQG RG+ DI 
Sbjct: 60 TRMDPRTEALLYAAARRQHLVEKVLPALEAGHVVLCDRFIDSSIiAYQGYARGIGFEDILA 119 

Query: 142 IOTFATDGLEPDLTLYFDVPSElGIjARINANQQRFAnTOLDLETIEIHQRVRKGYlALAKE 201 

+NEFA +G PDLTL F V ++GL+RI+ 4-Q RE NRLD E + HQ+V++GY + + 
Sbjct: 120 INEFAIEGRYPDLTLLFRVDPDVGLSRIHRDQSREQNRLDQEALTFHQKVKEGYERIVET 179 

Query: 202 HPKRIVTIDATKPLKEWSVALEHV 226 

+P4-R+V IDA + +W+ A+ + 
Sbjct: 180 YPERWE I D7ANQS FDQWADAVRMI 204 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 145/219 (66%) , Positives = 181/219 (82%) 

Query: 4 FDRIWI INKGCTMKKGLMISFEGPDGAGKTTVLEAVLPLLREKLSQDILTTREPGGVTI 63 

FD+I ++ ++G M G +1+ EGPDGAGKTTVLE ++PLL++K++QDILTTREPGGV I 
Sbjct: 9 FDKIELLKSEGNKMITGKLITVEGPDGAGKTTVLEQLIPLLKQKVAQDILTTREPGGVAI 68 

Query: 64 SEEIRHIILDVKHTQMDKKTELLLYMAARRQHLVEKVLPAI.EEGKIVLMDRFIDSSVAYQ 123 

SE IR +ILD+ HT MD KTELLLY+AARRQHLVEKVLPALE G++V +DRFIDSSVAYQ 
Sbjct: 69 SEHIRELILDINHTAMDPKTELLLYIAARRQKLVEKVLPALEAGQLVFIDRFIDSSVAYQ 128 

Query: 124 GSGRGLDKSHIKWLNDYATDSHKPDLTLYFDVPSEVGLERIQKSVQREVNRLDLEQLDMH 183 

G+GRGL K+ I+WLN++ATD +PDLTLYFDVPSE+GL RI + QREVNRLDLE +++H 
Sbjct: 129 GAGRGLIKADIQWLNEFATDGIiEPDLTLYFDVPSEIGIiARINANQQREvNRLDLETIEIH 188 

Query: 184 QRVRQGYLELADSEPNRIVTIDASQQLDEVIAETFSIIL 222 

QRVR+GYL LA P RIVTIDA++ L EV++ +L 
Sbjct: 189 QRVRKGYLAIAKEHPKRIVTIDATKPLKEWSVALEHVL 227 

Based on this analysis, it was predicted that these proteins and their epitopes could be usefid antigens for 
vaccines or diagnostics. 

Example 706 

A DNA sequence (GBSx0750) was identified in S.agalactiae <SEQ ID 2173> which encodes the amino 
acid sequence <SEQ ID 2174>. This protein is predicted to be DNA polymerase III delta' subunit (dnaZX). 
Analysis of this protein sequence reveals the following: 

Possible site: 26 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm — Certainty=0. 26 03 (Affirmative) < succ 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 



DLKRTQPKLLEKFNTILQSDRMSHAYLFSGNFA3 - - LDMALYLAQSQFCEKRQSGLPCQE 5 9 
+L + QP + L R++HAY+F GN + MAL+LA+S FC +R PCQ 

NLAKNQPFVATMLKNSLAKGRLAHAYIFDGNRGIGKKRI'IALHIJUCSFFCAQPAGVEPCQ 64 
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Query: 120 AANSLLKFIEEPQSSSYVILLTNDENNVLPTIKSRTQIFRF-PKQLDMLVHQAEQAGLLK 178 

AANSLLKF+EEP + + ILLT N+LPTIKSR+Q+ F P ++ E+ G+ + 

Sbjct: 125 AMSLLKFLEEPIiMJTVAILLTEQLQWMLPTIKSRSQVLSFAPLEVQAFAKLLEEEGISE 184 

5 Query: 179 SQASLLAQV 187 

S ++LLA + 
Sbjct: 185 SVSNLLASL 193 

A related DNA sequence was identified in S. pyogenes <SEQ ID 2175> which encodes the amino acid 
10 sequence <SEQ ID 2176>. Analysis of this protein sequence reveals the following: 

o N-terminal signal sequence 

Final Results 

15 bacterial cytoplasm Certainty=0 .2685 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

20 Identities = 151/290 (52%) , Positives = 213/290 (73%) , Gaps = 3/290 (1%) 

Query: 1 MDLKRTQPKLLEKFNTILQSDRMSHAYLFSGNFASLDMALYIAQSQFCEKRQSGLPCQEC 50 

MDL + P + + F TIL+ DR++HAYLFSG+FA+ +MAL+IA+ FCE+++ PC C 
Sbjct: 1 MDIAQKAPNVYQAFQTILKKDRENHAYLFSGDFANEEMALFLAKVIFCEQKKDQTPCGHC SO 

25 

Query: 51 RACRLIANGEFSDVKIIEPQGQLIKTETIKELTJCDFSRSGFEGKSQVFIIKDCEKMHVNA 120 

R+C+LI G+F+DV ++EP GQ+IKT+ +KE+ +FS++G+E K QVFIIKDC+KMH+NA 
Sbjct: 61 RSCQLIEQGDFADVTVLEPTGQVIKTDVWCEMMANFSQTGYENKRQVFIIKDCDKMHINA 120 

30 Query: 121 ANSLLKFIEEPQSSSYVILLTNDENNVLPTIKSRTQIFRFPKQLDMIjVHQAEQAGLLKSQ 180 

ANSLLK+IEEPQ +Y+ LLTOD+N VLPTIKSRTQ+F+FPK L A++ GLL Q 
Sbjct: 121 ANSLLKYIEEPQGEAYIFLLTNDDNKVLPTIKSRTQVFQFPKKEAYLYQLAQEKGLLNHQ 180 

Query: 181 ASL^QVADDPKHLEILLTNKKLLDYmLSQQFVTTLAKDRQTAYLEVSRLTSQVVDKND 240 
35 A L+A++A + HLE LL KLL+ + +++FV+ KD+ AYL ++RL +K + 

Sbjct: 181 AKLVAKJuATNTSHLERLLQTSKLLELITQAERFVSIWLKDQLQAYLAI^LVQLATEKEE 240 

Query: 241 QAFVFQWLTIMIAKE GQLYDLEKTTYRAQQMWKSIWSFQNSLEYMVLS 287 

Q V LT++LA+E L LE Y+A+ MW+SNV+FQN+LEYMV+S 

40 Sbjct: 241 QDLVLTLLTLLLARERAQ/TPLTQLEAVYQARLMWQSNVNFQNTLEYMVMS 290 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 707 

45 A DNA sequence (GBSx0751) was identified in S.agalactiae <SEQ ID 2177> which encodes the amino 
acid sequence <SEQ ID 2178>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0 . 2016 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 



55 The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB03765 GB:AP001507 unknown conserved protein in B. subtilis 
[Bacillus halodurans] 
Identities = 45/115 (38%) , Positives = 62/116 (52%) , Gaps = 8/116 (6%) 
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Query: 1 MDKKDLFDAFDDFSQNLLVGLSEIETMKKQIQKLLEENTVLRIENGKLRERLSVIEAET- 59 

M+KK +F + + E+ +K+Q+ L+EEN L I EN LRERL E E 

Sbjct: 1 ^KKAIFTQVSQLEERIGELHRELGGLKEQLAYLIEENHFLTIENEHLRERLGEPELEET 60 

5 Query: 60 ETAVKNSK QGRELIiEGiyNDGFHICNTFYGQRRENDEECAFCIELLYRD 108 

E K K +G + L +Y +GFHICNT YG R+N E+C FC+ L +D 
Sbjct: 61 EEKEQVTKERKPFVGEGYDNLARLYQEGFHICNTHYGSLRKNGEDOjFCLSFLNQD 116 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2179> which encodes the amino acid 
1 0 sequence <SEQ ID 21 80>. Analysis of this protein sequence reveals the following: 

Possible site: 22 

>» Seems to have no N-terminal signal sequence 

Final Results 

15 bacterial cytoplasm Certainty=0. 0700 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

20 Identities = 75/107 (70%), Positives = 89/107 (83%), Gaps = 1/107 (0%) 

Query: 1 MDKKDLFDAFDDFSQlsTLLVGLSEIETMKKQIQKLLEENTVLRIENGKLRERLSVIEAETE 60 

++KK+LFDAFD FSQNL+V L+EIE MKKQ+Q L+EENT+LR+EN KLRERLS +E ET 
Sbjct: 1 VNKKELFDAFDGFSQNLMOTIiAEIEAMKKQVQSLVEENTILRLENTKLRERLSHLEHET- 59 

25 

Query: 61 TAVKNSKQGRELLEGIYNDGFHICNTFYGQRRENDEECAFCIELLYR 107 

A SKQ ++ LEGIY++GFHICN FYGQRRENDEEC FC ELL R 
Sbjct: 60 VAKNPSKQRKDHLEGIYDEGFHICNFFYGQRRENDEECMFCRELLDR 106 

30 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 708 

A DNA sequence (GBSx0752) was identified in S.agalactiae <SEQ ID 2181> which encodes the amino 
acid sequence <SEQ ID 2182>. Analysis of this protein sequence reveals the following: 

35 Possible site: 48 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.28 Transmembrane 119 - 135 ( 119 - 135) 

Final Results 

40 bacterial membrane Certainty=0 . 1510 (Affirmative) < suco 

bacterial outside --- Certainty=0 .0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10051> which encodes amino acid sequence <SEQ ID 
45 1 0052> was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB03768 GB:AP001507 unknown conserved protein [Bacillus halodurans] 
Identities = 138/287 (48%) , Positives = 189/287 (65%) , Gaps = 2/287 (0%) 

50 Query: 4 MQVQKSFKSNIHYGTLYLVPTPIGNLDDMTFRAIRILREVDFICAEDTRNTGLLLKHFDI 63 

M+ Q+S++ GTLYLV TPIGNL+D+TFHAIR L+E D I AEDTR T LL HFDI 

Sbjct: 1 MKTQQSYQQRDDKGTLYLVATPIGNLEDVTFRAIRTLKEADQIAAEDTRQTKKLLNHFDI 60 

Query: 64 TTKQISFHEHNAYDKISGLIDLLKEGKSLAQVSDAGMPSISDPGHDDVKAAIEGDIPWS 123 
55 " TK +S+HEHN LID L EG+++A VSDAGMP+ISDPG++LV +AI+ I V+ 

Sbjct: 61 ATKLVSYHEHNKETMGKRLIDDLIEGRTIALVSDAGMPAISDPGYELWSAIKEGIAVIP 120 
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Query: 124 IPGASAGITALIASGLAPQPHIFYGFLPRKKGQQITFFETKQDYPETQIFYESPFRVSDT 183 

IPGA+A +TALIASGL + F GFLPR+K Q+ E + T IFYESP R+ DT 
Sbjct: 121 IPGANAAVTALIASGLPTESFQFIGFLPRQKKQRRQ.ALEETKPTKATLIFYESPHRLKDT 180 

Query: 184 LKHMKEIYGDRQWLVRELTKLYEEYQRGTISQLLEHIEK\'PLKGECLI1VDGKRDTERV 243 

L M I G+R V + RELTK YEE+ RGT+ + + + +KGE +IV+G + 
Sbjct: 181 LDDMLLILGNRHVSICRELTKTYEEFLRGTLEEAVHWAREATIKGEFCL1VEGNGEKVEP 240 

Query: 244 KDS - - SQQDPLVLVKEYIANGDKTNQAI KKVAKEFNLNRQELYASFH 288 

++ P+ V+ YIA G ++ +AIK+VA + + ++++Y +H 

Sbjct: 241 EEVWWESLSPVQHVEHYIALGFRSKEAIKQVATDRGVPKRDIYNIYH 287 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2183> which encodes the amino acid 
sequence <SEQ ID 2184>. Analysis of this protein sequence reveals the following: 

Possible site: 35 
»> SeemG to have no N-terminal signal sequence 

INTEGRAL Likelihood = -4.09 Transmembrane 116 - 132 ( 116 - 134) 

Final Results 

bacterial membrane Certainty=0 .2635 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 



MQVQKSFKDKKTSGTLYLVPTPIC-NLQDMTFRAVATLKEVDFICASDTRNTGLLLKIIFDI 60 
M+ Q+S++ + GTLYLV TPIGNL+D+TFRA+ TLKE D I AEDTR T LL HFDI 
MKTQQSYQQRDDKGTLYLVATPIGNLEDOTFRAIRTLKEADQIAAEDTRQTKKLLNHFDI 6 0 



+PGA+A +TALIASGL + F GFLPR+ Q++ E+ T +FYESP+R+KDT 







Sbjct: 


1 




61 


Sbjct: 


61 




121 


Sbjct: 


121 




181 


Sbjct: 


181 


Query: 


239 


Sbjct: 


241 



L +ML G+R V + RELTK +EE+ RG++ E + + E +KGE LIV 



V V+ I G -t 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 208/287 (72%) , Positives = 238/287 (82%) 

Query: 4 MQVQKSFKSNIHYGTLYLVPTPIGNLDDMTFRAIRILREVDFICAEDTRNTGLLLKHFDI 63 

MQVQKSFK GTLYLVPTPIGNL DMTFRA+ L+EVDFICAEDTRNTGLLLKHFDI 

Sbjct: 1 MQVQKSFKDKKTSGTLYLVPTPIGNLQDMTFRAVATLKEVDFICAEDTRNTGLLLKHFDI 60 

Query: 64 TTKQISFHEHNAYDKISGLIDLLKEGKSLAQVSDAGMPSISDPGHDLVKAAIEGDIPWS 123 

TKQISFHEHNAY+KI LIDLL G+SLAQVSDAGMPSISDPGHDLVKAAI + DI W+ 
Sbjct: 61 ATKQISFHEHNAYEKIPDLIDLLISGRSLAQVSDAGMPSISDPGHDLVKAAIDSDIAWA 120 

Query: 124 IPGASAGITALIASGLAPQPHIFYGFLPRKKGQQITFFETKQDYPETQIFYESPFRVSDT 183 

4-PGASAGITALIASGLAPQPH+FYGFLPRK GQQ FFE K YPETQ+FYESP+R+ DT 
Sbjct: 121 LPGASAGITALIASGLAPQPHVFYGFLPRKAGQQKAFFEDKHHYPETQMFYESPYRIKDT 180 



Query: 184 LKHMIffilYGDRQWLVRELTKLYEEYQRC-TISQLLEHIEKVPLKGECHIVDGKRDTERV 243 
L +M YGDRQWLVRELTKL+EEYQRG+IS++L ++E+ PLKGECL+IV G + V 
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Sbjct: 181 LTNMLACYGDRQWLVRELTKLFEEYQRGSISEILSYLEETPLKGECLLIVAGAQADSEV 240 

Query: 244 KDSSQQDPLVLVKEYIANGDKTNQAIKKVAKEFMLNRQELYASFHDL 290 
+ ++ D + LV++ I G K NQAIK +AK + +NRQELY FHDL 
5 Sbjct: 241 ELTADVDLVSLVQKEIQAGAKPNQAIKTIAKAYQVNRQELYQQFHDL 287 

A related GBS gene <SEQ ID 8643> and protein <SEQ ID 8644> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 10 
10 McG: Discrim Score: -6.92 

GvH: Signal Score (-7.5): -9.26 

Possible site: 48 
»i Seems to have no N- terminal signal sequence 
ALOM program count: 1 value: -1.28 threshold: 0.0 
15 INTEGRAL Likelihood = -1.28 Transmembrane 118 - 134 ( 118 - 134) 

PERIPHERAL Likelihood =6.89 32 
modified ALOM score: 0.76 

*** Reasoning Step: 3 

20 

Final Results 

bacterial membrane 
bacterial outside 
bacterial cytoplasm 

25 

The protein has homology with the following sequences in the databases: 

ORF00263 O10 - 1164 of 1470) 

EGAD|17863|BS0036(2 - 289 of 292) hypothetical 33.0 kd protein in xpac-abrb intergenic 
region {Bacillus subtilis} OMNI |NT01BS0044 conserved hypothetical protein 

30 SP|P37544|YABC_BACSU HYPOTHETICAL 33.0 KDA PROTEIN IN XPAC-ABRB INTERGENIC REGION. 

GP|467425|dbj |BAA05271.1| |D26185 unknown {Bacillus subtilis} 

Gpj 2632303 |emb|CAB11812.l| |Z99104 similar to hypothetical proteins {Bacillus subtilis} 
PIR|S66065|S66065 conserved hypothetical protein yabC - Bacillus subtilis 
fcMatch = 24.5 

35 %Identity =45.8 %Similarity =65.7 

Matches =131 Mismatches = 97 Conservative Sub.s = 57 

123 153 183 213 243 273 303 333 

CSTH*KW*TS*ASERY*SRNRNCS*KF*TRKRITRRHLQ*WLSHL*YFLWSTS*K*RRMCFLY*III*RLMEMQVQKSFK 
40 :: | || 

MLRRQMSFN 

363 393 423 453 483 513 543 573 

SNIHYGTLYLVPTPIGNLDDMTFRAIRILREVDFICAEDTRKTGLLLKHFDITTKQISFHEHNAYDKISGLIDLLKEGKS 
45 I I I I I I I 1 I I I M I I I I I I 1= II I MUM I -II =1=1111 =1= II 11= 

GKSDMGILYLVPTPIGNLEDMTFRAIDTLKSVDAIAAEDTRQTKKLCI-JVYEIETPLVSYHEHNKESSGHKIIEWLKSGKN 
20 30 40 50 60 70 80 

603 633 663 693 723 753 783 813 

50 LAQVSDAGMPSISDPGHDLVKAAIEGDIPWSIPGASAGITALIASGLfiPQPHIFYGFLPRKKGQQITFFETKQDYPETQ 
= 1 11111=1=11111 ::|| = II =111 = 1 =1111111= III Mill 1 = 1 =1 = II 

IALVSDAGLPTISDPGAEIVKDFTDIGGYWPLPGANAaLTRLIASGIVPQPFFFYGFLNRQKKEKKKELEALKKRQETI 
100 110 120 130 140 150 160 

55 843 873 903 933 963 993 1023 1053 

IFYESPFRVSDTLKHMKEIYGDRQVVLvRELTKLYEEYQRGTISQLLEHIEKWPLKGECLIIvDGKFJDTERVKDSSQQDP 
llll:] |: :|| I II llh: = lllll 111= 11111 = = = = ==ll = = l = l « I = = = 

IFYFAPHRLKETLSAMAEILGDREIAVTRELTKKYEEFIRGTISEVIGWAiSJEDQIRGEFCLVvEGSISINEEvDEEEQWWET 
180 190 200 210 220 230 240 

60 

1074 1104 1134 1164 1194 1224 1254 1284 

LVL VKEYIANGDKTNQAIKKVAKEFKIIiNRQELYASFHDL*VII*KGCQRKIWQPFIISDLAIGIKK*DTSNFLKIFN 

I h lh I = =1111 I = 1= = = l = l = = l 

LTAKEHVEHYI SKGATSKEAI KKAA VDRNVPKREVYDAYH I KQ 
65 260 270 280 290 



Certainty=0. 1510 (Affirmative) < suco 

--- Certainty=0. 0000 (Not Clear) < suco 
--- Certainty=0. 0000 (Not Clear) < suco 
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SEQ ID 8644 (GBS343) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 72 (lane 11; MW 35.4kDa). 

The GBS343-His fusion product was purified (Figure 215, lane 4) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 277), which confirmed that the protein is immunoaccessible 
5 on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 709 

A DNA sequence (GBSx0753) was identified in S.agalactiae <SEQ ID 2185> which encodes the amino 
10 acid sequence <SEQ ID 2186>. This protein is predicted to be bA483F11.3 (cutC). Analysis of this protein 
sequence reveals the following: 
Possible site: 41 

>» Seems to have no N-terminal signal sequence 

15 Final Results 

bacterial cytoplasm Certainty=0.25S8 (Affirmative) < suco 

bacterial membrane Certainty=0 . 00D0 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

20 The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB88199 GB:AL133353 bA483F11.3 (CGI-32 protein ) [Homo sapiens] 
Identities = 79/203 (38%) , Positives = 116/203 (56%) , Gaps = 7/203 (3%) 





3 


LREFCAENLTDLTRLDKAIISRVELCDNLAVGGTTPSYGVII02ANQYLHEKGISVAVMIR 


62 






L E C +++ ++ R+ELC L+ GGTTPS GV++ Q + IV VMIR 




Sbjct: 


27 


LMEVCVDSVESAWAERGGADRIELCSQLSEGGTTPSMQVLQVVKQSVQ- - - 1PVFVMIR 


83 




63 


PRGGNFVYNDLELRIMEEDILRAVELESDALVLGILTSNNHIDTEAIEQLLPATQGLPLV 


122 






PRGG+F+Y+D E+ +M+ DI A +D LV G LT + HID E L+ + LP+ 




Sbjct: 


84 


PRGGDFDYSDREIEVMKADIRLAKLYGADGLVFGALTEDGHIDKELCMSLMAICRPLPVT 


143 


Query: 


123 FHMAFDVlPKSDQKKSIDQLVALGFTRILLHGSSNGEPIIENIKHIKALVEyANNRIEIM 


1B2 






FH AFD4-+ D +++ L+ LGF R+L G + +E + IK L+E A RI +M 




Sbjct: 


144 


FHRAFDMV- -HDPMAALETLLTLGFERVLTSC-CDSS- -ALEGLPLIKRLIEQAKGRIWM 


199 


Query: 


183 


VGGGVTAENYQYI CQETGVKQAH 205 








GGG+T N Q I + +G + H 




Sbjct: 


200 


PGGGITDRNLQRILEGSGATEFH 222 





40 A related DNA sequence was identified in S.pyogenes <SEQ ID 2187> which encodes the amino acid 
sequence <SEQ ID 2188>. Analysis of this protein sequence reveals the following: 

Possible site: 57 

»> Seems to have no N-terminal signal sequence 

45 Final Results 

bacterial cytoplasm Certainty=0. 2372 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

50 An alignment of the GAS and GBS proteins is shown below: 

Identities = 143/208 (68%) , Positives = 168/208 (80%) 

Query: 2 ILREFCSffiNLTDLTRLDKAIISRVELCDNLAVGGTTPSYGVIKEANQYLHEKGISVAVMl 61 
+++EFCAENLT L LD ISRVELCDNLAVGGTTPSYGVIKEA Q LH+K ISVA Ml 
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Sbjct: 1 MIKEFCAENLTLLPTLDAGQISRVELCDNLA.VGGTTPSYGVIJ5EACQLLHDKKISVATMI 60 

Query: 62 RPRGGNFVYNDLELRIMEEDILRAVELESDALVLGILTSNNHIDTEAIEQLLPATQGLPL 121 

RPRGG+FVYNDLEL+ MEEDIL+AVE SDALVLG+LT+ N +DT+AI EQLLPATQGLPL 
Sbjct: 61 RPRGGDFVY1TOLELKAMEEDILKAVEAGSDALVLGLLTTENQLDTDAIEQLLPATQGLPL 120 

Query: 122 VFHMAFDVIPKSDQKKSIDQLVALGFTRILLHGSSNGEPI1ENIKHIKALVEYANNRIEI 181 

VFHMAFD IP Q +++DQL+ GF R+L HGS PI +N++ +K+LV YAN RIEI 
Sbjct: 121 VFHMAFDRIPTDHQHQALDQLIDYGFVRVLTHGSPEATPITDNVEQLKSLVTYANKRIEI 180 

Query: 182 MVGGGVTAENYQYI CQETGVKQAHGTRI 209 

M+GGG+TAEN Q + Q TG HGT+I 
Sbjct: 181 MIGGGITAENCQSLSQLTGTAIVHGTKI 208 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens f 
vaccines or diagnostics. 



Example 710 

A DNA sequence (GBSx0754) was identified in S.agalactiae <SEQ ID 2189> which encodes the amino 
acid sequence <SEQ ID 2190>. Analysis of this protein sequence reveals the following: 

20 Possible site: 23 

>>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1216 (Affirmative) < suco 

25 bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside .Certainty=0.0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAA12206 GB:D84061 phosphoserine aminotransferase [Spinacia 
30 oleracea] 

Identities = 65/109 (59%) , Positives = 79/109 (71%) , Gaps = 1/109 (0%) 

Query: 3 IYWFSAGPAVLPKPVLVKAQSELLNYQGSSMSVLEVSHRSKEFDDIIKGAERYLRDLMGI 62 
++NF+AGPAVLP+ VL KAQSELLN++GS MSV+E+SHR KEF II AE LR L+ I 
35 Sbjct: 69 VFNFAAGPAVLPENVLQKAQSELLNWRGSGMSVMEMSHRGKEFTSIIDKAEADLRTLLNI 128 

Query: 63 PDNYKVIFLQGGASLQFSMIPLNIARGRKAY-YHVAGSWGEKSLYRGCK 110 

P +Y V+FLQGGAS QFS IPLN+ A Y V GSWG+K+ K 

Sbjct: 129 PSDYTVLFLQGGASTQFSAIPLNLCTPDSAVDYIVTGSWGDKA&KEAAK 177 

40 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 



45 A DNA sequence (GBSx0755) was identified in S.agalactiae <SEQ ID 2191> which encodes the amino 
acid sequence <SEQ ID 2192>. Analysis of this protein sequence reveals the following: 

Possible site: 24 

>» Seems to have an uncleavable N-term signal seq 

50 Final Results 

bacterial membrane Certainty=0 .0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



55 The protein has no significant homology with any sequences in the GENPEPT database. 
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No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 712 

A DNA sequence (GBSx0756) was identified in S.agalactiae <SEQ ID 2193> which encodes the amino 
acid sequence <SEQ ID 2194>. This protein is predicted to be phosphoserine aminotransferase (serC). 
Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0.338D (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Wot Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10049> which encodes amino acid sequence <SEQ ID 
10050> was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAF94318 GB:AE004196 phosphoserine aminotransferase [Vibrio cholerae] 
Identities = 104/210 (49%) , Positives = 152/210 (71%) , Gaps = 3/210 (1%) 

NNTIEGTSLYDIPKTNEVPVIADMSSNILAVKYKVEDFAMIYAGAQKKIGPAGVTWIIR 63 
N TI+G + D+P T++ P++ADMSS IL+ + V + +IYAGAQKNIGPAG+ + I+R 
NETIDGIEINDLPVl'DK-PIVADMSSTILSREIDVSKYGVIYAGAQKNIGPAGICIAIVR 228 













64 


Sbjct: 


229 




123 


Sbjct: 


289 




183 


Sbjct: 


348 



L +L+YKI ++ S++NTPP ++ Y++ LVF+W+K+ GGV A+E+ X 



K+ LLY YIDSS+FY I 



GV L+DFMK FEA+ 
GVQALVDFMKEFEAQ 377 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 713 

A DNA sequence (GBSx0757) was identified in S.agalactiae <SEQ ID 2195> which encodes the amino 
acid sequence <SEQ ID 2196>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

- Final Results 

bacterial cytoplasm Certainty=0. 0466 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 
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A related GBS nucleic acid sequence <SEQ ID 10047> which encodes amino acid sequence <SEQ ID 
10048> was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB73701 GB:AL139079 putative acetyltransf erase [Campylobacter 
jejuni] 

Identities = 46/170 (27%), Positives = 78/170 (45%), Gaps = 13/170 (7%) 

Query: 7 1RLAFPNEIDQIMLLIEEARAEIAKTGSDQWQK3DGYPNRNDIIDDILNGYAWVGIEDGM 66 

1+ A +++ 1+ + ++A + QW ++ YPN +DI +V E+ 

Sbjct: 6 I QKAVNKDLNS I LE ITKDALNAMKTMNFHQW- -DENYPNEI VFQED IQAQELYVFKENDE 63 

Query: 57 LATYAAVIDGHE-EVYDAIYEGKWLHDNHRYLTFHRIAISNQFRGRGLAQTFLQGL 121 

+ + ++ +EY + K D YL HR+A+ +G+G+AQ Ii 
Sbjct: 64 ILGFI CINEKFKPEFYKQVT FNKNYDDKAFYL- -HRLAVKQNAKGKGVAQKLkNFCENFA 121 

Query: 122 IEGHKGPDFRCDTHEKNVTMQHILNKLGYQYCGKVPLDGVR LAYQKI 168 

+E HK R DTH KN M + KL + +CG + + LAY+KI 
Sbjct: 122 LENHKA-SLRADTHSKNFPMNSLFKKLDFNFCGNFDIPNYQDPFLAYEKI 170 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 714 

A DNA sequence (GBSx0758) was identified in S.agalactiae <SEQ ID 2197> which encodes the amino 
acid sequence <SEQ ID 2198>. Analysis of this protein sequence reveals the following: 



Final Results 

bacterial cytoplasm Certainty=0 .2968 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 715 

A DNA sequence (GBSx0759) was identified in S.agalactiae <SEQ ID 2199> which encodes the amino 
acid sequence <SEQ ID 2200>. This protein is predicted to be D-3-phosphoglycerate dehydrogenase 
(serA). Analysis of this protein sequence reveals the following: 

Possible site: 54 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3102 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 
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A related GBS nucleic acid sequence <SEQ ID 10045> which encodes amino acid sequence <SEQ ID 
10046> was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAB99020 GB:U67544 phosphoglycerate dehydrogenase (serA) 
[Methanococcus jannaschii] 
Identities = 102/313 (32%) , Positives = 168/313 (53%) , Gaps = 21/313 (6%) 





31 


Sbjct: 


40 


Query 


88 


Sbjct: 


100 


Query: 


148 


Sbjct: 






208 


Sbjct: 


209 




262 


Sbjct: 


269 


Query: 


321 


Sbjct: 


329 



LK I RAG G +NI +E A+ +GI+V N P A++ 4 



VKEaviAALLLSARDYLGANRWVNTLTGTDIPKQIEAGKKAFAGNEIAGKKLGVIGLGAI 147 
V E + +L +AR N T K+ E +K F G E+ GK LGVIGLG I 

VAELTMGLMLAAAR NIPQATASLKRGEITORKRFKGIELYGKTLGVIGLGRI 150 



++MKK I+N AR L++ + L+EA++ G + 



PH G ST+EA+ 



There is also homology to SEQ ID 124. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 716 

A DNA sequence (GBSx0760) was identified in S.agalactiae <SEQ ID 2201> which encodes the amino 
acid sequence <SEQ ID 2202>. This protein is predicted to be methylated-DNA~protein-cysteine S- 
methyltransferase (ogt). Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2460 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside -— Certainty=0: 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

?GP:AAF96913 GB:AE004427 methylated-DNA--protein-cysteine 
S-methyltransferase [Vibrio cholerae] 
Identities = 73/156 (46%), Positives = 99/156 (62%), Gaps = 9/156 (5%) 

Query: 7 YQSPLGEIRL1ADNLGLSGLYFVGQKYDMLAVNQEEIVNMSNSYTLLGK--KWLDAYFSQ 64 

Y SPLG + L A + GL G++F Q E 4- + +L K + LD YFS 

Sbjct: 7 YSSPLGPMTLQASSQGIiLGVWFATQ TTQPEHLGDYVKECPILNKTIRQLDEYFSG 61 



Query: 65 QNLP-SIPLSLRGTAFQTRVWQELQKIPFGDTKTYGELAKEL-NCQSAQAVGGAIGKNSI 122 
Q +PL+ GTAFQ VW L KIP+G+ +Y +LA+ + N -H- +AVG A GKN I 
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Sbjct: 62 QRTQFELPIMSGTAFQQSVWHALCKIPYGEIWSYQQLAERIGNPKAVRA.VGLANGKNPI 121 

Query: 123 SLI I PCHRVLGRYGQLTGYAGGLERKSWLLEYEKEK 158 

S+I+PCHRV+G+ GQLTGYAGGLERK++LLE EK + 
Sbjct: 122 SIIVPCHRWGKNGQLTGYAGGLERKAFLLELEKRR 157 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens f 
vaccines or diagnostics. 

Example 717 

A DNA sequence (GBSx0761) was identified in S.agalactiae <SEQ ID 2203> which encodes the ami 
acid sequence <SEQ ID 2204>. Analysis of this protein sequence reveals the following: 



Pinal Results 

bacterial cytoplasm Certainty=0. 3137 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB07204 GB:AP001518 arsenate reductase [Bacillus halodurans] 
Identities = 5S/107 (52%), Positives = 74/107 (68%), Gaps = 1/107 (0%) 

Query: 3 TFYEYPKCTTCRSAKKELTELGLTFEAIDIKSNPPKVSLLKELLENSPYDLKKFFNTSGN 62 

TFY+YPKC TC+ AKK L + G+ ++ I PP LK+L E S +LKKFFNTSG 
Sbjct: 4 TFYQYPKCGTCQKAKKWLDQHGIEVNSVHIVEQPPSKEELKQLYEQSGLELKKFFNTSGK 63 

Query: 63 SYRELGLKDKFDDLTLDQALDLLRSDGMLIKRPLLVKDNKILQIGYR 109 

YRELGLKDK + + D+ L+ LASDGMLIKRP+L +K+ +G++ 
Sbjct: 64 KYRELGLKDKVKEASEDELLETIASDGMLIKRPILTDGDKV-TVGFK 109 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2205> which encodes the z 
sequence <SEQ ID 2206>. Analysis of this protein sequence reveals the following: 

Possible site: 38 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3969 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 64/99 (64%), Positives = 79/99 (79%) 



Query: 79 DQALDLIASDGMLIKRPLLvKDNKILQIGYRTKYKDIJSIL 117 

D+A +LLA+DGMLIKRP+L+KD +LQ+GYR Y++L+L 
Sbjct: 63 DKAAELLATDGMLIKRPILIKDGNVLQVGYRKPYQELDL 101 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 
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Example 718 

A DNA sequence (GBSx0762) was identified in S.agalactiae <SEQ ID 2207> which encodes the amino 
acid sequence <SEQ ID 2208>. This protein is predicted to be exodeoxyribonuelease (exoA). Analysis of 
this protein sequence reveals the following: 

5 Possible site: 22 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1859 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 



Query: 1 MKLISWNIDSMAALTSESTRALMSRQVIDTLVAEDADIIAIQETKLSAKGPTKKHLEVl 60 

NIKLISWNIDSLNAALTS+S RA +S++V+ TLVAE+ADIIAIQETKLSAKGPTKKH+E+L 
Sbjct: 1 MKLISWNIDSLNAALTSDSARAKLSQEVLQTLVAENADIIAIQETKLSAKGPTKKHVEIL 60 

Query: 61 ETyFPEYDLVWRSSvEPARKGYAGTMFLYRKGtNPIVSFPEIDAPTTMDNEGRIITLELE 120 

E FP Y+ WRSS EPARKGYAGTMFLY+K L P +SFPEI AP+TMD EGRIITLE + 
Sbjct: 61 EELFPGYENTWRSSQEPARKGYAGTMFLYKKELTPTISFPEIGAPSTMDLEGRIITLEFD 120 

Query: 121 NCYITQWTPNAGDGLKRIADRQIWDIJCYAEfXATLDSQKPVIATGDYNVAHKEIDLANP 180 

++TQVYTPNAGDGLKRL +RQ+WD KYAEYLA LD +KPVLATGDYNVAH EIDLANP 
Sbjct: 121 AFFVTQWTPNAGDGLKRLEERQVTOAKYAEYLAELDKEKPVmTGDYNVAHNEIDLANP 180 

Query: 181 SSNPJ^SAGFTAEERQGFTNLIAKGFTDTFRYIBGDVPNVYSWWAQRSRTSKINNTGWRID 240 

+SNRRS GFT EER GFTNLIA GFTDTFR++HGDVP Y+WWAQRS+TSKINNTGSIRID 
Sbjct: 181 ASNRRSPGFTDEERAGFTNLLATGFTDTFRHVHGDVPERYTWWAQRSKTSKINNTGWRID 240 

Query: 241 YWLTSNRVADKITKSEMIHSGDRQDHTPI ILEIEL 275 

YWLTSNR+ADK+TKS+MI SG RQDHTPI+LEI+L 
Sbjct: 241 YWLTSNRIADKVTKSDMIDSGARQDHTPIVLEIDL 275 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2209> which encodes the amino acid 
sequence <SEQ ID 221 0>. Analysis of this protein sequence reveals the following: 

Possible site: 



I-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 2181 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

45 bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 221/275 (80%) , Positives = 251/275 (90%) 

Query: 1 MKIiISWNIDSLNAALTSESTRALMSRQVIDTLVAEDADIIAIQETKLSAKGPTKKHIiEVIi 60 

MKLISWNIDSLNAALT ES RAL+SR V+DTLVA+DADIIAIQETKLSAKGPTKKH+E L 
Sbjct: 1 MKLISraiDSUSIRALTGESPRALLSRAvIjDTLVAQDADIIAIQETKLSAKGPTKKHIETL 60 

Query: 61 ETYFPEYDLVWRSSVEPARKGYAGTMFLYRKGLNPIVSFPEIDAPTTMDNEGRIITLEIjE 120 

+YFP Y VWRSSVEPARKGYAGTMFLY+ LNP+++FPEI APTTMD EGRIITLE E 
Sbjct: 61 LSYFPNYLHVWRSSVEPARKGYAGTMFIiYKNTLNPVITFPEIGAPTTMDAEGRIITLEFE 120 

Query: 121 NCYITQVYTPNAGDGLKRLADRQITOIKYAEYIATLDSQKPVl^TGDYSWAHKEIDIiANP 180 

+ ++TQVYTPNAGDGL+RL DRQIWD KYA+YL LD+QKPVLATGDYNVAHKEIDLANP 
Sbjct: 121 DFFVTQVYTPNAGDGLRRLDDRQITOHKYADYLTEI£iAQKP\n^TGDYNVAHKEIDLANP 180 
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Query: 181 SSNRRSAGFTAEERQGFTNLIi^GFTDTFRYLHGDVPWATSVWAQRSRTSKII^GWRID 240 

+SNRRS GFT EERQGFTNLLA+GFTDTFR++HGD+P+VY+WWAQRS+TSKINNTGWRID 
Sbjct: 181 NSNRRSPGFTDEERQGFTNLLARGFTDTFRHVHGDIPHVYTMAQRSKTSKINHTGWRID 240 

5 Query: 241 YWLTSNRVADKITKSEMIHSGDRQDHTPIILEIEL 275 

YWL SNR+ DK+ +SEMI SG+RQDHTPI+L+I+L 
Sbjct: 241 YWLASNRLVDKVKRSEMISSGERQDHTOILLDIDIi 275 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
10 vaccines or diagnostics. 

Example 719 

A DNA sequence (GBSx0763) was identified in S.agalactiae <SEQ ID 221 1> which encodes the amino 

acid sequence <SEQ ID 221 2>. Analysis of this protein sequence reveals the following: 

Possible site: 39 
15 »> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -7.9S Transmembrane 2B - 44 ( 22 - 49) 



Final Results 

bacterial membrane Certainty=0. 4185 (Affirmative) < suco 

20 bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 8645> which encodes amino acid sequence <SEQ ID 8646> 

was also identified. Analysis of this protein sequence reveals the following: 

Lipop Possible site: -1 Crend: 5 
McG: Discrim Score: 17.78 
GvH: Signal Score (-7.5): -4.55 

Possible site: 55 
>» Seems to have an uncleavable N-term signal seq 
ALOM program count: 1 value: -7.96 threshold: 0.0 

INTEGRAL Likelihood = -7.96 Transmembrane 8 - 24 ( 2 - 29) 
PERIPHERAL Likelihood = 9.28 138 
modified ALOM score: 2.09 

*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0. 4185 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAD11512 GB:U60828 unknown [Lactococcus lactis] 
Identities = 53/240 (22%) , Positives = 102/240 (42%) , Gaps = 24/240 (10%) 

45 





65 


PTILIPGSSATQERFNSMIAQL NQMGE KHS VLKLTVKKDNSI I YNGQI SGNDHKPY 120 






PTI I GS + +4- +L N +K V+ + K+ + GQIS ++ P 


Sbjct: 


64 


PTIYIGGSGGNVTSIDWLVERLLPIKNISSQKSLVMTSNITKNYELKVEGQISQDNKYPI 123 




121 


IVIGFENNEDGYSNIKKQTKWLQIAMNDLQKKYKFIOIFNAIGHSNGGLSWTIFLEDYYDS 180 






I G ++ + +K LQ + L + Y4 N +G+S+G +4- D ++ 


Sbjct: 


124 


IEFA TVKGTNSGELFSKGLQKI IVYLTENYQVPWINLVGYS SGATGAVYYMMDTGNN 180 



Query: 181 DEFD-MKBLLTMGTPFNFEES NTSN HTQMLKDLI SNKGNI PSSLMVY 226 

55 F + +++ +N E + + SN T+M + + N + S + 

Sbjct: 181 PNFPPVNKWSLDGEYNNETNLQLGESLSm'LKEGPIVKTEMYQYIADNYQKVSSKTQML 240 

Query: 227 NLAGT- -NSYDGDKIVPFASVETGKYIFQETAKHYTQLTVTGNNATHSDLPDNPEVIQYV 284 
L Q + D +P+A + ++F++ T T+ +HS P NP V++YV 

60 Sbjct: 241 LLEGNFNSEKQTDSAIPWADSFSIYHLFKKNGNEITT-TLYPTKTSHSQAPKNPTWKYV 299 
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No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 8646 (GBS219) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 43 (lane 3; MW 31.6kDa). It was also expressed in E.coli as a GST-fusion 
5 product. SDS-PAGE analysis of total cell extract is shown in Figure 47 (lane 7; MW 561cDa). 

GBS219-GST was purified as shown in Figure 203, lane 5. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 720 

10 A DNA sequence (GBSx0764) was identified in S.agalactiae <SEQ ID 2213> which encodes the amino 
acid sequence <SEQ ID 2214>. This protein is predicted to be PTS system, cellobiose-specific IIC 
component. Analysis of this protein sequence reveals the following: 



Possible site: 



Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood 
Likelihood = -0 



signal sequence 



200 - 216 ( 197 - 226) 

157 - 173 ( 156 - 175) 

Transmembrane 307 - 323 ( 3D6 - 332) 

Transmembrane 131 - 147 ( 126 - 148) 

Transmembrane 375 - 391 ( 370 - 3 96) 

Transmembrane 101 - 117 ( 98-119) 

Transmembrane 326 342 ( 324 - 342) 



16 Transmembrane 71 - 87 ( 71 - 



Final Results 

bacterial membrane Certainty=0 .4057 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC74807 GB:AE000268 PEP-dependent phosphotransferase enzyme II 

for cellobiose, arbutin, and salicin [Escherichia coli K12] 
Identities = 60/197 (30%), Positives = 83/197 (41%),. Gaps = 12/197 (6%) 

Query: 209 LAIFLTLSGLFVPDIL--FRPYSYFSWSENIjN7AALSQHTDKIPYLYTFYTVKNSFAMFG 266 

LA+ +G+ PL Y + V LA + H PL +SF G 

Sbjct: 253 LALTALDNGIMTPWALENIATYQQYGSVEAALAAGKTFHIWAKPML DSFIFLG 305 

Query: 267 GIGILLSLFLAVLYESRKLQSKNYYKLTLLTLTPLIFDQNLPFLVGLPVILQPILFIPMV 326 

G G L L LA+ SR+ 4Y ++ L L IF N P L GLP+I+ P++FIP V 
Sbjct: 306 GSGATLGLILAIFIASRRA DYRQVAKLALPSGIFQINEPILFGLPIIMNPVMFIPFV 362 

Query: 327 LTTIFAEAFGALMLYLKFVDPAVYTVPSGTPSLLFGFLASNGDWRYLPVTAIILvVGFFI 386 

L A Y++P PP+LF+NG LV L+I 

Sbjct: 363 LVQPIIiAAITIiAAYYMGIIPPVTWrAPTOMPTGLGAFFWIWGSVAALLVALFNLGIATLI 422 

Query: 387 YRPFVKIAFAKEEQYEK 403 

Y PFV +A + +K 
Sbjct: 423 YLPFVWANKAQNAIDK 439 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 721 

A DNA sequence (GBSx0765) was identified in S.agalactiae <SEQ ID 2217> which encodes the amino 
acid sequence <SEQ ID 221 8>. Analysis of this protein sequence reveals the following: 



^-terminal signal sequence 



- Final Results 

bacterial cytoplasm - 
bacterial membrane - 
bacterial outside - 



• Certainty=Q . ±i 
Certainty=0 . 0( 

• Certainty=0 . 0( 



1 (Affirmative) < succ: 
0(Not Clear) < suco 
0(Not Clear) < suco 



The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 



vaccines or 



Example 722 

A DNA sequence (GBSx0766) was identified in S.agalactiae <SEQ ID 221 9> which encodes the a 
acid sequence <SEQ ID 2220>. Analysis of this protein sequence reveals the following: 



Possible site: 39 



3 N-terminal signal sequence 



Likelihood = -S 
Likelihood 
Likelihood 
Likelihood 
Likelihood 



Transmembrane 188 - 204 
Transmembrane 105 - 121 
Transmembrane 212 - 228 



48 



- Final Results 

bacterial t 
bacterial outside - 
bacterial cytoplasm • 



Transmembrane 124 - 



-- Certainty=0. 3314 (Affirmative) - 
- Certainty=0. 0000 (Not Clear) < , 
-- Certainty=0 . 0000 (Not Clear) < i 



A related GBS nucleic acid sequence <SEQ ID 8647> which encodes amino acid sequence <SEQ ID 8648> 
was also identified. Analysis of this protein sequence reveals the following: 



Lipop Possible s 



Crend: 6 



McG: Length of UR: 5 

Peak Value of UR: 2.99 

Net Charge of CR: 4 
McG: Di scrim Score: 6.88 
GvH: Signal Score (-7.5): -2.86 

Possible site: 30 
»> Seems to have an uncleavable N-term signal set 
Amino Acid Composition: calculated from 1 
ALOM program count: 5 value: -5.79 threshold: 



INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
PERIPHERAL Likelihood = 
modified ALOM score: 1.6 
icml HYPID: 7 CFP: 0.331 

*** Reasoning Step: 3 



-5.79 
-5.36 
-4.41 
-3.45 
0.10 



Transmembrane 179 • 



Transmembrane 203 



195 ( 170 - 197) 

112 ( 95 - 118) 

219 ( 201 - 220) 

79 ( 60 - 80) 



- Certainty=0. 3314 (Affirmative) < suco 
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bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0.0000(Not Clear) < succ> 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 222 1> which encodes the amino acid 
sequence <SEQ ID 2222>. Analysis of this protein sequence reveals the following: 

Possible site: 30 

>» Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood =-11.20 Transmembrane 179 - 195 ( 173 - 201) 

INTEGRAL Likelihood = -3.66 Transmembrane 95 - 112 ( 95 - 113) 

INTEGRAL Likelihood = -1.44 Transmembrane 203 - 219 ( 203 - 219) 

INTEGRAL Likelihood = -0.96 Transmembrane 115 - 131 ( 115 - 131) 

INTEGRAL Likelihood = -0.64 Transmembrane 63 - 79 ( 63 - 79) 



- Certainty=0. 5479 (Affirmative) < suco 



The protein has no significant homology with any sequences in the GENPEPT database. 
An alignment of the GAS and GBS proteins is shown below: 

Identities = 160/228 (70%) , Positives = 185/228 (80%) 

Query: 10 MSKKSHRQYQIYEGLRCAVALCFISGYINAFTYVTQGKRFAGVQTGNLLSFAIHLSNKHY 69 

MSKK + YQ+YEGLRCA+ LCFISGY+NAFTY+TQGKRFAGVQTGNLLSFAI LS + 
Sbjct: 1 MSKKKRKHYQVYEGLRCAMTLCFISGYVNAFTYMTQGKRFAGVQTGNLLSFAIRLSEQQL 60 



Query: , 70 SQALAFLLPIMVFMLGQSFTYFi'IKRWANKHQLffi\TLLSSFALTQVAIVTlII 

+AL FLLP+ + VFMLGQS FTYFM+RWA K LHWYLLSS LT +A T + TPFLPS+ 
Sbjct: 61 KEALQFLLPMIVFMLGQSFTYFMHRWATKKGLHWYLLSSVILTGIAFGTALFTPFLPSNV 120 

Query: 130 TVAGLAFFASIQVDTFKSLRGAPYANMvIMTGNIKNAAYLLTKGLYEKNSDIFLIARNTII 189 

TVA LAFFASIQVDTFK4LRGA YAN+MMTGNIKNARYLLTKGLYEKN ++ I RNT+I 
Sbjct: 121 TVAALAFFASIQVDTFKTLRGASYANWIMTGNIKNAAYLLTKGLYEKNHELTHIGRNTLI 180 

Query: 190 IIGGFIFGWCSTYFSSKLGEWSLSLILIPLLYVNLLLGHEFYNLQVE 237 

+1 F GWCST GE++L IL+PLLYVN LL EFY++Q 4- 

Sbjct: 181 VILAFAVGWCSTLLCIAYGEYALMPILMPLLYVNYLLAQEFYHIQTK 228 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 723 

A DNA sequence (GBSx0767) was identified in S.agalactiae <SEQ ID 2223> which encodes the amino 
acid sequence <SEQ ID 2224>. This protein is predicted to be tellurite resistance protein. Analysis of this 
protein sequence reveals the following: 

Possible site: 20 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.00 Transmembrane 190 - 206 ( 190 - 206) 

Final Results 

bacterial membrane --- Certainty=0. 1001 (Affirmative) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC22923 GB:U32807 tellurite resistance protein (tehB) 
[Haemophilus influenzae Rd] 
Identities = 164/282 (58%) , Positives = 205/282 (72%) , Gaps = 1/282 (0%) 

LLPYKTMPVWTAQS IPKAFLEKHNTKEGTWAKLTILSGSLVFYQLSPDGEE I SRHI FDAS 66 
L+ YK MPVWT ++P+ F EKHNTK GTW KLT+L G L FY4L+ +G4 1+ HIF 
LICTKQMPVVWKDNLPQMFQEKHNTICVGTWGKLTVLKGKLKFYELTENGDVIAEHIFTPE 64 



S IPFV+PQ WH4V S DL C L FYC+KEDYF KKY T 





7 


Sb j Ct : 


5 


Query: 


67 


Sbjct: 


65 




127 


Sbjct: 


124 


Query: 


187 


Sbjct: 


184 


Query: 


247 


Sbjct: 


244 



+LDLGCGQGRNSLYLSLLG+ VTS D ti 



+YDFI+STWFMFLN + + II M+ HT +GGYNLIV+AM T + PCPLPF FTF E 



+LK YY DWE ++YNEN+GELH+ DENGNR+K++FAT+IARK 



No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 2224 (GBS95) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 5 (lane 3; MW 35.6kDa) and in Figure 12 (lane 4; MW 35.6kDa). The GBS95- 
His fusion product was purified (Figure 191, lane 7) and used to immunise mice. The resulting antiserum 
was used for FACS (Figure 292), which confirmed that the protein is immunoaccessible on GBS bacteria. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 



Example 724 

A DNA sequence (GBSx0768) was identified in S.agalactiae <SEQ ID 2225> which encodes the amino 
35 acid sequence <SEQ ID 2226>. This protein is predicted to be methionyl-tRNA synthetase (metS). Analysis 
of this protein sequence reveals the following: 

Possible site: 47 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.32 Transmembrane 473 - 489 ( 473 - 489) 

40 

Final Results 

bacterial membrane Certainty=0. 1128 (Affirmative) < suco 

bacterial outside Certainty=0.0000 (Not Clear] <z suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) ■< suco 

45 

A related GBS nucleic acid sequence <SEQ ID 10043> which encodes amino acid sequence <SEQ ID 
10044> was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB11814 GB:Z99104 methionyl-tRNA synthetase [Bacillus subtilis] 
50 Identities = 395/667 (59%), Positives = 501/667 (74%), Gaps = 12/667 (1%) 

Query: 20 EKKSFYITTPIYYPSGKLHIGSAYTTIACDVLARYKRMMGFDVQYLTGLDEHGQKIQQKA 79 

E +FYITTPIYYPSGKLHIG AYTT+A D +ARYKR+ GFDV+YLTG DEHGQKIQQKA 
Sbjct: 4 ENlWFYITTPIYYPSGKIiHIGHAYTTOAGDAMARYKRLKGFDVRYLTGTDEHGQKIQQKA 63 
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Query- 


80 


Sbjct: 


64 


Query: 


140 


Sb j ct : 


124 


Query: 


199 


Sbjct: 


184 


Query: 


259 


Sbjct: 


244 


Query: 


319 


Sbjct: 


303 


Query: 


379 


Sbjct: 


363 


Query: 


438 


Sbjct: 


423 


Query: 


498 


Sb j ct : 


483 


Query: 


555 


Sbjct: 


540 




615 


Sbjct: 


599 


Query: 


675 


Sb j ct : 


657 



EEAGITPQEyVDGMaESVKTLIvELLDISYDKFIRTTDTYHEEAVAXIFEQLLAQGDIYLG 139 
E+ ITPQEYVD A ++ LW+ L+IS D FIRTT+ H+ + K+F++LL GDIYL 
EQENITPQEYVDRAAADIQPCLWKQLEISNDDFIRTTEKRHKWIEKVFQKLLDNGDIYLD 123 



EY GWYS+ DE F+TE+QIi ++ R+E G +IGG +P SGH VE + EESYFFRM KYADR 



L YY E+P FIQP+ R NEM+ NFI + P3LEDLAVSRTT+ WGV+VP NPKHV+YVWIDA 



L NY++ALGY +D Y K+WPAD+H++GK+ 1 +RFK+ IYWPIMLMALDLPLPK++ AH 



GW +M+DGKMSKSKGNW P L+ER+GLD LRYYL+R +P GSDG FTPE +V RINY+ 



IAHTOLGNLLNRT+AM+NKYFDG++ 



ALFJVVWNLISRTNKYIDETAPWV1AKDETDRDKIAAVMSHLVASLRWAHLIQPFMMETS 497 
AL +W LISRTNKYIDETAPWVLAKD ++L +VM HL SLR+ A L+QPF+ +T 
ALSTLWQLI SRTNKYIDETAPWVIAKDPAKEEELRSVMYHLAESLRI SAVLLQPFLTKTP 482 



3 P+FPRL+ E+EI YIK +M 



RQ++SGIAK Y E ELVGKKL V NLKP K ++ +SQGMIL+ E DG L V+++D + 



A related DNA sequence was identified in S.pyogenes <SEQ ID 2227> which encodes the amino acid 
sequence <SEQ ID 2228>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1245 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 516/665 (77%) , Positives = 573/665 (85%) , Gaps = 4/665 (0%) 

Query: 21 KKSFYITTPIYYPSGKLHIGSAYTTIACDVIARYKRMMGFDVQYLTGLDEHGQKIQQKAE 80 

KK FYITTPIYYPSGKLHIGSAYTTIACDVLARYKR+MG +V YLTGLDEHGQKIQ KA+ 
Sbjct: 3 KKPFYITTPIYYPSGKLHIGSAYTT1ACDVLARYKRIMGHEVFYLTGLDEHGQKIQTKAK 62 

Query: 81 EAGITPQEYVKSMAESVKTLWELLDISYDKFIRTTDTyHEEAVAKIFEQLLAQGDIYLGE 140 

EAGITPQ YVD MA+ VK LW+LLDISYD FIRTTD YHEE VA +FE+LLAQ DIYLGE 
Sbjct: 63 EAGITPQIYVBNMAKDVKALWQLIjDISYDTFIRTTDDYHEEWAAVFEKLIAQDDIYLGE 122 
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Query: 


141 


Sbjct: 


123 


Query: 


201 


Sbjct: 


183 


Query: 


261 


Sbjct: 


243 


Query: 


320 


Sbjct: 


302 


Query: 


380 


Sbjct: 


362 


Query: 


439 


Sbjct: 


422 


Query: 


499 


Sbjct: 


482 


Query: 


550 


Sbj ct: 


542 




618 


Sbjct: 


602 


Query: 


678 


Sbjct: 


662 



Y+GWYSVSDEEFFTESQL EV+RDE+G + IGG+APSGHEVE VSEESYF R+SKY DRL 



NY +ALGY ++ 



+ HM+GKDILRFHSIYWPI+LM LDLP+P RL+AHG 



WFVM+DGKMSKSKGNVWPEKLVERFGLDPLRYYLI4RSLPVGSDGTFTPEDYVGRINYEL 



LEAVW +I+RTNKYIDETAPWVLAK++ D+ +LA+VM+HL ASLR+VAH+ 1 QPFMMETS 



DL L AD ? +WAKG+PIFPRLDME EI YIK QM 



EKEWVPEEV L S K 1 FE FDAVE I RVAEV EV KVEGS+KLLRFR+DAGD 



LSGIAKFYPNKQEI.WAI I < > I ll F r II MIL EH +LTVLTVDS+V I 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 725 

A DNA sequence (GBSx0769) was identified in S.agalactiae <SEQ ID 2229> which encodes the amino 
acid sequence <SEQ ID 223 0>. Analysis of this protein sequence reveals the following: 

Possible site: 35 



s to have r 



I- terminal signal sequence 



Final Results 

bacterial cytoplasm --- Certainty=0. 2633 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 726 

A DNA sequence (GBSx0770) was identified in S.agalactiae <SEQ ID 223 1> which encodes the amino 
acid sequence <SEQ ID 2232>. This protein is predicted to be branched chain amino acid transport system 
II carrier protein (brnQ). Analysis of this protein sequence reveals the following: 



Possible site: 26 
»> Seems to have a cleavable 1 

INTEGRAL Likelihood =-14. 

Likelihood = -9. 

Likelihood = -6. 

Likelihood = -S. 

Likelihood = -4 

Likelihood = -4. 

Likelihood = -4. 

Likelihood = -2. 
Likelihood 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



= -1 



signal seq. 



279 - 295 



Transmembrane 



345 - 351 



Transmembrane 
Transmembrane 
Transmembrane 



38 



• Final Results - 

bacterial i 
bacterial outside - 
bacterial cytoplasm - 



-- Certainty=0 . 5965 (Affirmative) ■ 
-- Certainty=0 . 0000 (Not Clear) < i 
— Certainty=0 . 0000 (Wot Clear) < : 



A related GBS nucleic acid sequence <SEQ ID 9407> which encodes amino acid sequence <SEQ ID 9408> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAC00400 GB:AF008220 branch-chain amino acid transporter 
[Bacillus subtilis] 

Identities = 130/367 (35%) , Positives = 204/367 (55%), Gaps = 12/367 (3%) 

Query: 1 MSEKFSPWFSLTFLVILYLTIGPLFAIPRTATVSFEIGVAPIVGHSP--IALLCFTACFF 58 

+++K P F F V+LYL+IGPLFAIPRT TVS+EIG P + P ++LL FT FF 
Sbjct: 73 LADKAHPVFGTIFTWLYLSIGPLFAIPRTGWSYEIGAVPFLTGVPERLSLLIFTLIFF 132 

Query: 59 AAAYYLAIRPNGILDSVGKILTPVFAFLILSLVWGAIAYGNLESAKASADYAGKAFGSG 118 

YYLA+ P+ ++D VGKILTP+ F 1+ ++V+ AI + Y G G 

Sbjct: 133 GVTYYLALNPSKWDRVGKILTPI - KFTI ILI I VLKAI FTPMGGLGAVTEAYKGTPVFKG 191 

Query: 119 VIAGYNTLDALAAVAFCLVATETLKKFGFKTKKEYLSTIWIVGIVTSLAFSILYIGLGFL 178 

L GY T+DALA++ F +V +K G K + G++ +L + +Y+ L +L 

Sbjct: 192 FLEGYKTMDALASIWGVVvWAvKSKGVTQSKALAAACIKAGVIAALGLTFIYVSLAYL 251 

Query: 179 GNKFPVPADILADPNVNKGAYVLSQASYKLFGNFGRYFLSIMVTLTCFTTTVGLIVSVSE 238 

G A V +GA +LS +S+ LFG+ G L +T+ C TT++3L+ S + 

Sbjct: 252 G ATSTNAIGPVGEGAKILSASSHYLFGSLGNIVLGAAITVACLTTSIGLVTSCGQ 306 

Query: 239 FFDKNFRFGNYKLFATVFTLIGFLIANLGLNAVTTFSVPVLTLLYPIVIVIVLIILINKW 298 

+F K +YK+ T+ TL +IAN GL +1 FSVP+L+ +YP+ IVI+++ I+K 

Sbjct: 307 YFSKLIPALSYKIWTIVTLFSLIIANFGLAQIIAFSVPILSAIYPLAIVIIVLSFIDKI 366 

Query: 299 LPLSKK GMSLTIGLVTLVSFVEVLAGQWQEKTLTQLVGFLPFHTISMGWLVPMLIGI 355 

++ + GL +++ ++ AG L LP +++ +GW++P ++G 

Sbjct: 367 FKERREVYIACLIGTGLFSILDGIKA-AGFSLGSLDVFLNANLPLYSLGIGWVLPGIVGA 425 

Query: 356 VFSLVLS 362 

Sbjct: 426 VIGYVLT 432 

There is also homology to SEQ ID 2234. 

A related GBS gene <SEQ ID 8649> and protein <SEQ ID 8650> were also identified. Analysis of this 
protein sequence reveals the following: 
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Lipop: Possible site: -1 Crend: 3 
SRCFLG: 0 

McG: Length of OR: 3 0 

Peak Value of tJR: 2.99 

Net Charge of CR: 2 
McG: Discrim Score: 13.17 
GvH: Signal Score (-7.5): -3.3 

Possible site: 33 
i» Seems to have an uncleavable N-term signal seq 



Amino Acid Composition: calculated from 1 










ALOM program 


count: 11 value: -14 


.91 threshold: 


0.0 








INTEGRAL 


Likelihood =- 


14.91 


Transmembrane 


347 


363 


337 




INTEGRAL 


Likelihood = 


-9.98 


Transmembrane 


15 0 


166 


142 


170 


INTEGRAL 


Likelihood = 


-7.54 


Transmembrane 


40 




36 


61 


INTEGRAL 


Likelihood = 


-6.64 


Transmembrane 


79 


95 


76 




INTEGRAL 


Likelihood = 


-6.00 


Transmembrane 




241 


221 


247 


INTEGRAL 


Likelihood = 


-4.30 


Transmembrane 






113 


134 


INTEGRAL 


Likelihood = 


-4.14 


Transmembrane 


319 


335 


318 




INTEGRAL 


Likelihood = 


-4.09 


Transmembrane 


376 


392 


373 


394 


INTEGRAL 


Likelihood = 


-2.92 


Transmembrane 




23 


6 




INTEGRAL 


Likelihood = 


-2.55 


Transmembrane 


285 


302 


284 


305 


INTEGRAL 


Likelihood = 


-1.38 


Transmembrane 


194 


210 


194 


210 


PERIPHERAL 


Likelihood = 


2.49 


402 










modified ALOM 


score: 3.4£ 














icml HYPID: 7 


CFP: 0.696 















*** Reasoning Step: 3 

Pinal Results 

bacterial membrane — Certainty=0 . 6965 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

ORF00247(304 - 1596 of 1941) 

omni |ntoibs3447 (19 - 446 of 459) branched chain amino acid transport system II carrier 

protein 

%Match =21.7 

%Identity =38.8 %Similarity = 61.2 

Matches = 166 Mismatches = 157 Conservative Sub.s = 96 

93 123 153 183 213 243 273 303 

VLTVDSAVANGSIIG*SKRALCSFFVFKKKVTE*LENYENDLEFIFIFDIIKDIDSKHLDRI**GEFMERV*IDYLH*WL 

LTEYFNII IRRI FFMKHS 
10 

333 363 393 423 453 483 513 543 

LMVKKGFLTGLLLFGIFFGAGNLIFPPALGVASGQDFWPAILGFCLSC-VGLAIITLLLGTLTNGGYKTEMSEKFSPWFSL 
I II = |::|| =111111=1=11 11 l=l== I II II 1=1111 == == II I ===l I I 
LPVKDTIIIGFMLFALFFGAGNMIYPPELGQMGH^mVKAIGGFLLTGVGLPLLGIIAIALTGKDAKG-LADKAHPVFGT 
30 40 50 60 70 80 90 

573 603 633 657 687 717 747 777 

TFLVILYLTIGPLFAIPRTATVSFEIGVAPIVGHSP- -IALLCFIACFFAAAYYLAIRPNGILDSVGKILTPVFAFLILS 

11 = 111 = 11111 I llll 111 = 111 I = I = = ll II II 1111= 1= = = l 1111111= 11 = 
IFTVvliYLSIGPLFAIPRTGTVSYEIGAVPFLTGVPERLSjLI FTLIFFGVTYYLALNPSKVVDRVGKILTPI-KF 
110 120 130 140 150 160 170 

801 831 861 891 921 951 981 1011 

LVWGAI - -AYGNLESAKASADYAGKAFGSGvIAGYiraLDAIAAVAFCLVATETLKKFGFKTKKEYLSTIWIVGIVTSLA 
::|: II I I « = I I llll I ' I I I I ' « I =1 =11 I = I = = =1 

IIVLKAIFTPMGGLGA- - vTEAYKGTPVFKGFLEGYKTMDAIAS!VFGVV\A7NAVKSKGVTQSKMiAAACIKAGvlAALG 
190 200 210 220 230 240 250 



1041 1071 1101 H31 1161 1191 1221 1251 

FSILYIGLGFLGNKFPVPADILADPNVNKGAYVLSQASYKLF^ 
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270 280 290 300 310 320 

1281 1311 1341 1371 1401 1431 1461 1488 

imO^ATVFTLIGFLIANLGLNAVITFSVPVLT^^^ 

:||; |: || ::|||:|| :| ||||:|: :||: :|:| : |: : | : : :| | : 

SYKIVVTI VTLFSLIIANFGLAQIIAFSVPILSAIYPLAIVI IVLSFIDK- - - IFKERREVYIACLTGTGLFSILDGIKA 
340 350 360 370 380 390 400 

1518 1536 1566 1596 1626 1656 1686 1716 

QEKTLTQLVGFL PFHTISMGWLVPMLIGIV?SLVLSDKQKGQAFDLSKFEG+HYFNFIDMSKRLKLRF*PFLYQIF 

:| | || |::: S :||::| ::| | || : 
AGFSLGSLDVFLNANLPLYSLGIGWVLPGIVGAVIGYVLTLFIGPSKQIiNEIS 
420 430 440 450 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 727 

A DNA sequence (GBSx0771) was identified in S.agalactiae <SEQ ID 2235> which encodes the amino 
acid sequence <SEQ ID 2236>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

25 Final Results 

bacterial cytoplasm — Certainty=0 . 3291 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

30 A related GBS nucleic acid sequence <SEQ ID 10041> which encodes amino acid sequence <SEQ ID 
10042> was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
35 vaccines or diagnostics. 

Example 728 

A DNA sequence (GBSx0772) was identified in S.agalactiae <SEQ ID 2237> which encodes the amino 
acid sequence <SEQ ID 223 8>. Analysis of this protein sequence reveals the following: 

Possible site: 39 
40 >>> Seems to have a cleavable N-term signal seq. 

INTEGRAL Likelihood = -8.33 Transmembrane 117 - 133 ( 112 - 136) 
INTEGRAL Likelihood = -3.77 Transmembrane 53 - 69 ( 53 - 70) 
INTEGRAL Likelihood = -3.40 Transmembrane 98 - 114 ( 97 - 115) 

45 Final Results 

bacterial membrane Certainty=0. 4333 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

50 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 729 

A DNA sequence (GBSx0773) was identified in S.agalactiae <SEQ ID 2239> which encodes the amino 
5 acid sequence <SEQ ID 2240>. Analysis of this protein sequence reveals the following: 

Possible site.- 15 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -4.19 Transmembrane 22 - 38 ( 20 - 44) 

10 Final Results 

bacterial membrane Certainty=0 . 2678 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

15 A related GBS nucleic acid sequence <SEQ ID 8651> which encodes amino acid 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop Possible site: -1 Crend: 3 
SRCFLG: 0 

McG: Length of OR: 21 
20 Peak Value of UR : 3.11 

Net Charge of CR: 2 
McG: Discrim Score: 11.30 
GvH: Signal Score (-7.5): -5.35 
Possible site: 28 
25 >» Seems to have an uncleavable N-term signal seq 

Amino Acid Composition: calculated from 1 
ALOM program count: 1 value: -4.19 threshold: 0.0 

INTEGRAL Likelihood = -4.19 Transmembrane 5 - 21 ( 3 - 2 
PERIPHERAL Likelihood =6.74 53 
30 modified ALOM score: 1.34 

icml HYPID: 7 CFP: 0.268 



35 Final Results 

bacterial membrane Certainty=0. 2678 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

40 The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB15623 GB:Z99122 spore coat protein (inner) [Bacillus subtilis] 
Identities = 71/359 (19%) , Positives = 148/359 (40%) , Gaps = 49/359 (13%) 

Query: 127 ISYRGNTSRYFDKKSLKVKFVTNKLKEKKHRI^GMPKESEWVIiHGPFLDRTLLRNYLSYN 186 
45 I+YRG+ R F KKS + F K + L+ + D +L+RN LS + 

Sbjct: 47 IAYRGSHIRDFKKKSYHISFYQPKTFRGAREIH-- LNAEYKDPSLMRNKLSLD 97 

Query: 187 IAGEIMSYAPNVRYCELFVNGEYQGVYLAVBNIEQGEQRVPIKKSDKKLHKTPYIVAWDR 246 
E+ + +P + + +NG+ +GVYL +E++++ + +KL AD 

50 Sbjct: 98 FFSELGTLSPKAEFAFVKMNGKNEGVYLELESVDE YYLAKRKLADGAI FYAVDD 151 

Query: 247 EHKAKQKLDNYVHYTHQSGISALDVKYPGKQRLTSKQLEFINKD 1NHIEKVLYSYD 302 

+ D + ++L++ Y +++ +++ +F IN + K + 

Sbjct: 152 DANFSLMSD LERETKTSLELGY- -EKKTGTEEDDFYLQDMIFKINTVPKAQFK- - 202 

55 

Query: 303 FSQYPKYIDRESFANYFVINEFFRNVDAGKFSTYLYKDLRDRA-KLVVWDFNNAFDNQIE 361 

S+ K++D + + + F N D + LY+ +++ WD++ + I 

Sbjct: 203 -SEVTKHVDVDKYLRWLAGIWTSNYDGF\'HNY-&LYRSGETGLFEVIPWDYDAraGRDIH 261 

60 Query: 362 GRVDEADFTLTDAPWFNMLIKDKAFIDLVVHRYKELRKGVLATEYLSNYIDETRHFLGPA 421 
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Query: 422 IDROTKKWt3YVFDLKOTDPRNYLIPTERH-VTSYHKS\TEQLKDFIKKRGRWMDRNIETL 479 

I Y++ P + P ++N + + + + + ++IK R +++ ++ L 

Sbjct: 313 IMAMYER IRPFVLMDPYKKNDIERFDREPDVICEYIKNRSQYLKDHLSIL 362 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens f 



Example 730 

A DNA sequence (GBSx0774) was identified in S.agalactiae <SEQ ID 2241> which encodes the amino 
acid sequence <SEQ ID 2242>. Analysis of this protein sequence reveals the following: 



Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

20 bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
25 vaccines or diagnostics. 

Example 731 

A DNA sequence (GBSx0775) was identified in S.agalactiae <SEQ ID 2243> which encodes the amino 
acid sequence <SEQ ID 2244>. Analysis of this protein sequence reveals the following: 

Possible site: 21 
30 >» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -4.62 Transmembrane 5 - 21 ( 3 - 24) 

Final Results 

bacterial membrane --- Certainty=0. 2848 (Affirmative) < suco 

35 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB05949 GB:AP001514 unknown [Bacillus halodurans] 
40 Identities = 199/697 (28%) , Positives = 322/697 (45%) , Gaps = 58/697 (8%) 

Query: 57 KPFWKGVDVESSLAGYHHNDFPITQKTYRBWFHLISNMGANTTOVKVPMNVAFYDALYH 116 

K + GV++ G + I +K Y WF I MG N +RV FY AL 

Sbjct: 414 KKLQIHGVNLGMGKPGTFPGEAAIKE1C3YYRKFEQIGEMGGNAIRVYTLHPPGFYHALKR 473 

45 

Query: 117 HNICASKRPLYLLQ/SIRIDSYRKNASITAFlTONYRGYLraiKAKGvVDILHGRKQVWNTDLG 176 

+N+ + P+YL G+ ID ++ AF++ ++E K +VD++HG V + + G 

Sbjct: 474 YNEQHENPIYLFHGVWIDEEPLEDTLDAFDEETNEEFQQ3MKRIVDVIHGNAW-DPNPG 532 

50 Query: 177 SRH--YHYDLSPWVLGYWGDDVMSGWAYTKTIQEKKT-QYKGRYFKTSVAANPFEVMLA 233 

H Y D+SP+ +G+++G +W TV TN Y G+Y +T A PFE LA 

Sbjct: 533 HAHGVYQADVS PYT IGWI IGIEWYPHTVK&TNKNNPDIGDYDGKYVETK- DAEPFEYWLA 591 
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Query: 


234 


Sbjct: 


592 


Query: 


293 


Sbjct: 


651 


Query: 


353 


Sbjct: 


706 


Query: 


412 


Sbjct: 


766 


Query: 


467 


Sbjct: 


825 


Query: 


521 


Sb j ct : 


881 


Query: 


581 








637 


Sbjct: 


995 




684 


Sbjct: 


1055 



D h YE +Y W +SF+N TTD 



++E+EQG+ ++E +E I G I WQD+W R K 



¥ +AQ Q +GLL F 



3DLYASSDESYLYLAIKTKPEKLKE KRLLPIDITPKSGSRKMNGSK-VTFSKS 520 

LY DE YLY IK + +L +D P G+ + + VTF 

--LYMDHDERYLYFRIDMKSGSTDDFFKDGFPILVLDTLPGQGNEHIKEVEGVTFDHG 880 



+ P+ N+ F++I+ L N -t 



L+F DPS +++ 



No corresponding DNA sequence was identified in S.pyogenes. 

A related GBS gene <SEQ ID 8653> and protein <SEQ ID 8654> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 2 
McG: Discrim Score; 12.00 
GvH: Signal Score (-7.5): -5.46 

Possible site: 21 
»> Seems to have an uncleavable N-terrr, signal seq 
ALOM program count: 1 value: -4.62 threshold: 0.0 

INTEGRAL Likelihood = -4.62 Transmembrane 5 - 21 ( 3-24) 
PERIPHERAL Likelihood = 7.32 223 
modified ALOM score: 1.42 

*** Reasoning Step: 3 

Final Results 

bacterial membrane Certainty=0 . 2848 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

SEQ ID 2244 (GBS62) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 5 (lane 7; MW 80.5kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 13 (lane 4; MW 105kDa). 

The GBS62-GST fusion product was purified (Figure 100A; see also Figure 193, lane 7) and used to 
immunise mice (lane 1 product; 20ug/mouse). The resulting antiserum was used for Western blot (Figure 
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100B), FACS (Figure 100C ), and in the in vivo passive protection assay (Table III). These tests confirm 
that the protein is immunoaccessible on GBS bacteria and that it is an effective protective immunogen. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 732 

A DNA sequence (GBSx0778) was identified in S.agalactiae <SEQ ID 2245> which encodes the amino 
acid sequence <SEQ ID 2246> in others. Analysis of this protein sequence reveals the following: 

Possible site: 14 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -7.48 Transmembrane 310 - 326 ( 302 - 335) 

INTEGRAL Likelihood = -7.32 Transmembrane 362 - 378 ( 361 - 380) 

INTEGRAL Likelihood = -7.11 Transmembrane 334 - 350 ( 329 - 355) 

INTEGRAL Likelihood = -2.28 Transmembrane 381 - 397 ( 380 - 397) 

Final Results 

bacterial membrane — Certainty=0. 3 994 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10039> which encodes amino acid sequence <SEQ ID 
10040> was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB05950 GB:AP001514 unknown conserved protein in others 

= 226/405 (55%), Gaps = 5/405 (1%) 

Query: 11 IVPAYNESTTIVSSIDSLLHLDYEAYEIIWDDGSSDNTSDVLKEEFALMKISNTIDSII 70 

+VPAYNE T 1+ ++ SLL L Y EI+W+DGS+D T +V+ E F ++K+ I I 
Sbjct: 69 LVPAYNEETGIIETWSLLSLKYPQTEIVWNDGSTDQTLEVIIEHFQMVKVGKVIRKQI 128 

Query: 71 ATQTCKnVFQRQVGKVKLTLIVKENGGKGDALIWGINAANYDYFLCLDADSMLQVDSLSQ 13 0 

T+ K V+Q + L L+ K NGGK DALN G+N + Y YF +D DS+L+ D+L + 
Sbjct: 129 ETEPIKGVYQSTIFP-HLLLVDKSNGGKADALNAGLNVSKYPYFCSIDGDSILETDALLK 187 

Query: 131 ISKSIQV DPTVIAVGGLVQVAQGVKIEQGKVASYRLPWRIIPCAQALEYDSSFLGA 186 

+ K I + VIA GG V++A G 1+ G V S +L + Q +EY +FL 

Sbjct: 188 VMKPIVTSRDDEDEVIASGGNWIAKGSDIQMGSVLSVQLAKNPLVVMQVIEYLRAFLMG 247 

Query: 187 RIFLDYLRANLIISGAFGLFKKDLVKAVGGYDTQTLGEDMELVMKLHFFCRMNNIPYRIC 246 

RI L LIISGAF +F K V GGY +T+GEDMELV++LH + + RI 

Sbjct: 248 RIGLSRHNMVL 1 1 SGAFSVFAKKWVMEAGGYS KKTVGEDMEL VVRLHRLVKEKRLKKRIT 307 

Query: 247 YETDAVCWSQAPTNLGDLRKQRRRWYLGLYQCLKKYKSIFANYRFGAVGSISYIYYILFE 306 

+ D VCW++AP L++QR RW+ GL + L ++ + N ++G VG+ S Y+ + E 

Sbjct: 308 FVPDPVCWTEAPATFRVLQRQRSRWHRGLMESLWLHRGMTENPKYGLVGTASIPYFWIVE 367 

Query: 307 LLTPFIECFGIVIIFLSLLFNQLNIPFFISLVSLYIFYCVLITLSSFLHRIYSQQLVIGI 366 

P +E G + I + F L + F ++L L++ Y + ++++ + +S + + 
Sbjct: 368 FFGPVVELMGYLYIVFAFFFGGLYVEFALALFLLFVLYGTVFSMTAVILEGWSLKRYPKV 427 

Query: 367 LDIWVFYIAVFRYLILHPVLTFVKVASVIGYKNKKMVWGHITRE 411 

D+ ++ ++F L P+ + ++I + WG +TR+ 
Sbjct: 428 SDMSRLMIFSLFEALWYRPLTVLWRFGAIIEALFRSKAWGEMTRK 472 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2247> which encodes the amino acid 
sequence <SEQ ID 2248>. Analysis of this protein sequence reveals the following: 

Possible site: 60 
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> Seems to have no N-terminal signal sequence 



Likelihood =-1 
Likelihood 
Likelihood 
Likelihood 
Likelihood 



Transmembrane 
Transmembrane 
Transmembrane 



• Final Results 

bacterial membrane ■ 
bacterial outside - 



Transmembrane 403 



49 ( 24 - 

3S2 ( 370 - 

360 ( 342 - 

79 ( 55 - 

419 ( 403 - 



- - Certainty=0 .5416 (Affirmative 
Certainty=0. 0000 (Not Clear) • 



bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 84/397 (21%), Positives = 173/397 (43%), Gaps = 71/397 (17%) 

Query: 6 FRRKS I VPAYNEST - TI VSS IDSLLHLDYEAYEI I WDDGS SDNTSDVLKEEFALMKI SN 64 

++ +++P+YNE +++ ++ S+L Y EI +VDDGSS+ + L EE+ ++ 
Sbjct: 90 YKVAAVIPSYNEDAESLLETLKSVLAQTYPLSEIYIVDDGSSNTDAIQLIEEY VNR 145 

Query: 65 TIDSIIATQTCKDVFQRQVGKVKLTLIVKENGGKGDALNMGINAANYDYFLCLDADSMLQ 124 

+D C++V V +L+ N GK A ++ D FL +D+D+ + 

Sbjct: 146 EVD ICRNVI VHRSLV NKGKRHAQAWAFERSDADVFLTVDSDTYIY 190 

Query: 125 TOSLSQISKSIQVDPWIAVGGLVQVACGVKIECG3CVASYRLPWRIIPCAQALEYDSSFL 184 

++L ++ KS D TV A G + + ++ + YD++F 

Sbjct: 191 PNALEELLKSFN-DETVYAA TGHLNARNRQTNLLTRLTDIRYDNAF- 235 

Query: 185 GARI FLD YLRANL 1 1 - SGAFGLFKKD - LVKAVGGYDTQT LGEDMELVMKLHFF 235 

G L N+++ SG +++++ ++ + Y QT +G+D L 
Sbjct: 236 GVERTaAQSLTGNILVCSGPLS I YRREVI I PNLERYKNQTFLGLPVS IGDDRCLT 289 

Query: 236 CRNNNI PY -RI CYETCAVCWSQAPTNLGDLRKQRRRWYLGLY - QCLKKYKS I FANYRFGA 293 

N I R Y++ A C + P L KQ+ RW + + + K I +N 
Sbjct: 290 - -NYAIDLGRTVYQSTARCDTDVPFQLXSYLKQQNRWNKSFFKESIISVKKILSN P 343 

Query: 294 VGSISYIYYILFELLTPFIECFGIVI1FLSLLFNQLNIPFFISLVSLYIFYCV--LITLS 351 

+ ++ 1+ ++ 4+ +++ +LLFNQ + L+ L+ F + ++ L 

Sbjct: 344 I VALWT I FEWMFMM LIVAIGNLLFNQ AIQLDL I KLFAFLS 1 1 FI VALC 392 

Query: 352 SFLHRIYSQQLVIGILDIVKVFYIAVFRYLILHPVLT 388 

+H+ + + +++V + LL+ + T 

Sbjct: 393 RNVHYMI KHPASFLLSPLYGILHLFVLQPLKLYSLCT 429 



A related GBS gene <SEQ ID 8655> and protein <SEQ ID 8656> were also identified. Analysis of this 
protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 8 
McG: Discrim Score: -5.18 
GvH: Signal Score (-7.5): -4.91 

Possible site: 14 
»> Seems to have no N-terminal 
ALOM program count: 4 value: 
Likelihood = ■ 
Likelihood = • 
Likelihood = • 
Likelihood = • 
PERIPHERAL Likelihood = 
modified ALOM score: 2.00 



;ignal sequence 
7.48 threshold: 
l Transmembrane 



310 - 326 ( 302 - 335) 

362 - 378 ( 361 - 380) 

334 '- 350 { 329 - 355) 

381 - 397 ( 380 - 397) 



■ Reasoning Step: 3 

•-- Final Results 

bacterial membrane Certainty=0 .3994 (Affirmative) < succ; 

bacterial outside — - Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 
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The protein has homology with the following sequences in the databases: 

ORF00238(331 - 1401 of 1866) 

GP|581390l|gb|AAD52055.l|AF086783_3|AF086783(52 - 367 of 412) IcaA {Staphylococcus e 
%Match =10.3 

%Identity =34.8 %Similarity =55.9 

Matches = 109 Mismatches = 128 Conservative Sub.s = 66 

150 180 210 240 270 300 330 360 

VAMRRSSKLNLGWPPFACm**aVFOTANISSKAAm*TPTRR^TlTSWCLLAE*FIELLYHILFRRKSIVPAYNESTT 

- Mil I 

MQFFNFLLFYPVFMSIYWIVGSIYFYFTREIRYSLKKKPDINVDELEGITFLLACYNESET 



IVSSIDSLLHLDYEAYEIIWDDGSSDMTSDVL KEEFALMKISNTIDSIIATQTCKDVFQRQVGKVKLTLIVKENGG 

I :: ::| I II I I I = : I I I I I I I = : = : II = = « -II I 

IEDTLSTWIALKYEIGCEIIIINDGSSDNTAELiIYKIKENNDFIFVD LQENRG 



KGDAEHMGINAANYDYFLCLDADSMLQVDSLSQI SKS IQVDPTVIAVGGLVQVAQGVKIEQGKVASYRLPWRI I PCAQAL 

I HI II |:||| :|||||::: | : : s : : | | : | | | : : I : I = 
KANALNQGIKQASYDYVMCLDADTIVDQDAPYYMIEMFKHDPKLGAVTGNPRIRNKSSI LGKIQTI 

130 140 150 160 170 

25 

861 891 918 948 978 1008 1038 1068 

EYDSSFLGARIFLDYLRANL-IISGAFGLFKKDLVKAVGGYDTQTLGEDMELVMKlHFFCRKn^IPYPaCYETDAVCWSQ 

II :h = l I =1111 llll I II =11 = II: > III III II 1 = 11 
EY-ASLIGCIKRSQTLAGAVNTISGVFTLFKKSAWDVGYWDTDMITEDIAVSWKLH LRGYRIKYEPLAMCWML 

30 190 200 210 220 230 240 250 

1098 1128 1155 1194 1224 1254 1284 

APTNLGDLRKQRRRWYLGLYQCL-KKYKS1FANYRFG AVGSISYIYYILFELLTPFIECFGIVIIFLSLLFNQ 

I II I III II I == I = = I II =11 ==l =1= III III 

35 VPETLGGLWKQRVRWAQGGHEVLLRDFFSTMKTKRFPLYILMFEQIISILWVYIVLLYLGYLFI TANFBDYTFMT 

270 280 290 300 310 320 

1311 1341 1371 1401 1431 1461 1491 1521 

MIP-FFISLVSLYIFYCVLITLSSFLHRIYSQQLVIGILDIVKWFYIAVFRYLILHPvIjTFVKVASVIGYKNKKMVWGH 
40 : |::| :: : |:= |= | :: = |=: 

YSFSIFLLSSFTMTFINVIQFWALFIDSRYEKKNMAGLIFVSWYPTVYWIINAAVvIjVAFPIGUjKRKRGGYATWSSPDR 
340 350 360 370 380 390 400 

Based on this analysis, it was predicted that these proteins and their epitopes could he useful antigens for 
45 vaccines or diagnostics. 

Example 733 

A DNA sequence (GBSx0779) was identified in S.agalactiae <SEQ ID 2249> which encodes the amino 
acid sequence <SEQ ID 225 0>. Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2014 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAA22725 GB:AL035161 hypothetical protein SC9C7.13C 
[Streptomyces coelicolor A3 (2)] 
Identities = 35/153 (22%) , Positives = 64/153 (40%) , Gaps = 5/153 (3%) 
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Query: 5 IRRARLGDEVNLAYIQTESWKAAFQKILPEDIIQKTTEIEPAITMYQQLIiHKEVGKGYIL 64 

+R L D ++ 1+ W++A+ ++P+ + A G+ ++ 

Sbjct: 10 WEMTLADCDRVSLIRWGWQSAYRGLMPQPYLDAMDPAADAERRRSLFARPPEGRVNLV 59 

Query: 65 EVDSNPHCMAWWD KSREDGMLDYAEUCIHSLKEGWGKGYGSQMMNHVLSEIQQAG 120 

D + W + E D AEL ++ +G G G + + + AG 

Sbjct: 70 AEDEGGEWGWACHGPYRDGEARTAD-AELYALYVDAARFGAGIGRftLAGESVRRCRAAG 128 

Query: 121 YNKVI LWVFTENTRARKFYDRFGFS FKGKSKTY 153 

+ +++LWV N RAR+FYDR GF G + + 
Sbjct: 129 HARMLLWVLKGNVRARRFYDRAGFRPDGAEEPF 161 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 734 

A DNA sequence (GBSx0780) was identified in S.agalactiae <SEQ ID 2251> which encodes the amino 
acid sequence <SEQ ID 2252>. This protein is predicted to be a DNA-binding protein. Analysis of this 
protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0 . 1162 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 735 

A DNA sequence (GBSx0781) was identified in S.agalactiae <SEQ ID 2253> which encodes the amino 
acid sequence <SEQ ID 2254>. Analysis of this protein sequence reveals the following: 

N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2589 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10037> which encodes amino acid sequence <SEQ ID 
10038> was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2255> which encodes the amino acid 
sequence <SEQ ID 2256>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=C . 2767 (Affirmative) < succ; 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 80/86 (93%), Positives = 84/86 (97%) 

Query: 6 LKTIKENNMTFEEILPSLKAKKlKYraTGW 65 

+ +IKENNMTFEEILPGLKAKKKYWTGWGGAENYVQLFDTLEV+GKVLQATPYFLI+VT 
Sbjct: 3 ISSIKENNMTFEEILPGLKAKKKW^^ 62 



Query: 66 G 

G GEGFSMWAPTPCDVLAEDWIEVND 
Sbjct: 63 GAGEGFSMWAPTPCDVLAEDWIEVND 88 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 736 

A DNA sequence (GBSx0782) was identified in S.agalactiae <SEQ ID 2257> which encodes the amino 
acid sequence <SEQ ID 2258>. Analysis of this protein sequence reveals the following: 

i uncleavable N-term signal seq 



Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) . 

bacterial outside — Certainty=0 . 0000 (Not Clear) . 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) . 

The protein has homology with the following sequences in the GENPEPT 

>GP:BAA85256 GB:AB021978 3-oxoacyl- [acyl carrier protein] reductase 
homolog [Moritella marina] 
Identities = 82/239 (34%) , Positives = 125/239 (51%) , Gaps = 15/239 (6%) 

TKVVLVTGCASGIGYAQAQYFLKQGYQVYGVDKSDKPNLN GNFNF- IKLDLSSDL 55 

+K VLVTG + GIG A A++F KGVGS+ G+F ++L+++S 

SKTVIjVTGASRGIGRAIAEHFAKLGAWIGTATSAQGAERIGAYLGDAGFGLELNVTSQD 64 



Query: 


2 


Sbjct: 


5 




56 


Sbjct: 


65 




110 


Sbjct: 


124 




170 


Sbjct: 


184 



M++++ G IIN+ S+ GG A Y ++K L GFT+ LA + A I + +APG 

GMMKQRHGRIINIGSWGTTGNGGQANYAAAKSGLLGFTFCSLASEVASRGITVNAVAPGF 183 

VQTAMTASDFEPGGLAEWVASETPIGRWTKPSEVAELTGFLASGKARSMQGEIVKIDGG 228 
+4-T MTA E + + ++ P R +E+AE GFLAS A + GE + ++GG 

IETDMTAELTEE- -QKQTILAQVPTSRLGSTTEIAETVGFLASDGASYrTGETIHVNGG 240 

There is also homology to SEQ IDs 2628 and 7170. 

A related sequence was also identified in GAS <SEQ ID 9107> which encodes the amino acid sequence 

<SEQ ID 9108>. Analysis of this protein sequence reveals the following: 

Possible site: 1 
>» Seems to have an 

Final Results - 
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bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 
bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 206/232 (88%), Positives = 224/232 (95%) 

Query: 1 MTKv^OTGCASGIGYAQAQYFLKQGYQVyGTOKSDKPlJLNGNFNFIKLDLSSDLSPLFT 60 

MTKWLVTGCASGIGYAQA+YFLKQG+ WGVDKSDKP+L+GNF+FIKLDLSS+L+PLF 
Sbjct: 4 MTKWLOTGCASGIGYAQARYFLKQGHHVYGVDKSDKPDLSGNFHFIKLDLSSEIiAPLFK 63 

Query: 61 ^WPTVDILCmAGII^AYKPLL^EVSDEELEHLFDINFFVITOLTRHYLRRMVEKKSGIII 120 

WP+VDILCNTAGILDAYKPLL+VSDEE+EHLFDINFF TV+LTRHYLRRMVEK+SG+II 
Sbjct: 64 WPSVDIL<^AGILDAYKPLLDV3DEEVEHLFDINFFATVKLTRHYLRRMVEKQSGVII 123 

Query: 121 NMCSIASFIAGGGGAAYTSSKHALAGFTRQLALDYAKDCIQIFGIAPGAVQTAMTASDFE 180 

NMCSIASFIAGGGG AYTSSKHALAGFTRQLALDYAKD I IFGIAPGAV-t-TAMTA+DFE 
Sbjct: 124 NMCSIASFIAGGGGVAYTSSKHAIAGFTRQLALDYAKDQIHIFGIAPGAVKTAMTANDFE 183 

Query: 181 PGGLAEWVASETPIGRWTKPSEVAELTGFIASGKARSMQGEIVKIDGGWSLK 232 

PGGLA+WVA ETPIGRWTKP EVAELTGFLASGKARSMQGEIVKIDGGW+LK 
Sbjct: 184 PGGLADWVARETPIGRWTKPDEVAELTGFLASGKARSMQGEIVKIDGGWTLK 235 

A related DNA sequence was identified in S.pyogenes <SEQ ID 9063> which encodes amino acid sequence 
<SEQ ID 9064>. An alignment of the GAS and GBS sequences follows: 

Score = 83.1 bits (202), Expect = 4e-18 

Identities = 72/258 (27%), Positives = 106/258 (40%), Gaps = 36/258 (13%) 

Query: 6 EVAFITGAASGIGKQIGETLLKEGKJVVFSDINQE KLDQWADYTKEGYDAFSW 60 

+V +TG ASGIG + LK+G V D + + ++D + + F++V 

Sbjct: 3 KVVI:VTGCASGIGYAQAQYFLKQGYQvYGVDKSDKPNLNGNFNFIKLDLSSDLSPLFTMV 62 

Query; 61 CDVTKEEAINAAIDTWEKYGRIDILVI^AG-LQHVAMIEDFPTEKFEFMIKIMliTAPFI IIS 

+DIL N AG L ++ E+E+I 

Sbjct: 63 PTVDIIiCNTAGILDAYKPLLEVSDEELEHLFDINFFVTVR 102 

Query: 120 AIKRAFPTMKAQKHGRIINMASINGVIGFAGKSAYNSAKHGLIGLTKVTALEAADSGITV 179 

+ M +K G IINM SI I G +AY S+KH h G T+ AL+ A I + 

Sbjct: 103 LTRHYLRF^MVEKKSGIIINMCSIASFIAGGGGAAYTSSKHALAGFTRQLALDYAICOCIQI 162 

Query: 180 NAICPGYVDTPLVRGQFEDLSKTRGIPLENVLEEVLYPLVPQKRLIDVQEIADYVSFLAS 239 

IPGVT+ FE LE+ PR E+A+ FLAS 

Sbjct: 163 FGIAPGAVQTAMTASDFE PGGLAEMVASETPIGRWTKPSEVAELTGFLAS 212 

Query: 240 DKAKGVTGQACILDGGYT 257 

KA+ + G+ +DGG++ 
Sbjct: 213 GKARSMQGEIVKIDGGWS 230 

A further related DNA sequence was identified in S.pyogenes <SEQ ID 2259> which encodes the amino 
acid sequence <SEQ ID 2260>. An alignment of the GAS and GBS sequences follows: 



Query: 4 MTKWLVTGCASGIGYAQARYFLKQGHHVYCADKSDKPDLSGKFHFIKLDLSSELAPLFK 63 

MTKWLVTGCASGIGYAQA+YFI1KQG+ VYGVDKSDKP+Ij+GNF+FIKLDLSS+L+PLF 
Sbjct: 1 MTKWLVTGCASGIGYAQAQYFLKQGYQVYGVDKSDKPNLNGNFNFIKLDLSSDLSPLFT 60 

g-'uery: 64 WPSVDILCNTAGILDAYKPLLDVSDEEVEHI.FDINFFATV^CLTRHYI,RR^WEKQSGVII 123 

+VP+VDILCNTAGILDAYKPLL+VSDEE+EHLFDINFF TV+LTRHYLRRMVEK+SG+ 1 1 
Sbjct: 61 WPTVDILCNTAGrLDAYKPLLEVSDSELEHLFDINFFi/TVRLTRHYIjRRMVEKKSGIII 120 



Query : 124 roaCSIASFIAGGGGVAYTSSKHAIAGFTRQLALDYAKDQIHIFGIAPGAVKTAMTANDFE 183 
NMCSIASFIAGGGG AYTSSKHALAGFTRQLALDYAKD I IFGIAPGAV+TAMTA+DFE 



WO 02/34771 



-834- 



PCT/GB01/04789 



Sbjct: 121 MVICSIASFIAGGGGAAYTSSKHAlAGFTRQLMilVAKDCIQIFGIAPGAVQTAMTASDFE 180 

Query: 184 PGGLADWVARETPIGRWTKPDEVAELTGFLASGPARSMQGEIVKIDGGWTLK 235 
PGGLA+WVA ETPIGRWTKP EVAELTGFLASGKARSMQGEIVKIDGGW+LK 
5 Sbjct: 181 PGGLAEWVASETPIGRWTKPSEVAELTGFLASGKARSMQGEIVKIDGGWSLK 232 

SEQ ID 2258 (GBS251) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 43 (lane 2; MW 21.7kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 47 (lane 6; MW 52kDa). 

10 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 737 

A DNA sequence (GBSx0783) was identified in S.agalactiae <SEQ ID 2261> which encodes the amino 
acid sequence <SEQ ID 2262>. Analysis of this protein sequence reveals the following: 

15 Possible site: 48 < 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -3.82 Transmembrane 62 - 78 ( 62 - 79) 

Final Results ■ 

20 bacterial membrane --- Certainty=0. 2529 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

25 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 738 

A DNA sequence (GBSx0784) was identified in S.agalactiae <SEQ ID 2263> which encodes the amino 
30 acid sequence <SEQ ID 2264>. Analysis of this protein sequence reveals the following: 

Possible site: 31 

>>> Seems to have no N-terminal signal sequence 

Final Results 

35 bacterial cytoplasm Certainty=0 . 1495 (Affirmative) <, suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

40 >GP.-CAA20397 GB:AL031317 SC6G4.19C, unknown, len: 190 aa; contains 

Pro-Ser- r ich domain at N-terminus [Streptomyces 
coelicolor A3 (2)] 

Identities = 26/80 (32%), Positives = 44/80 (54%), Gaps = 5/80 (6%) 

45 Query: 1 MDSNDEAICIIEITKVDIVPFKDVSADHAFKEGEGDKTLEWWRKAHIDFF KPYFE 55 

+DS + + +IE+T+V +VP +V HA EGEGD ++ WR H F+ + 
Sbjct: 103 VDSRERPVAVIEVTEVRWPIAEVDIAHAVDEGEGDTSVAGWRAGHERFWHGAEMRAALG 162 

Query: 56 EFGLMFSEDSRIVLEEFQW 75 
50 + G + + +VLE F++V 

Sbjct: 163 DPGFTVDDATPWLERFRIV 182 
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No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 739 

A DNA sequence (GBSx0785) was identified in S.agalactiae <SEQ ID 2265> which encodes the amino 
acid sequence <SEQ ID 2266>. Analysis of this protein sequence reveals the following: 

Possible site: 40 

»> Seems to have an uncleavable N-term signal eeg 

INTEGRAL Likelihood « -1.49 Transmembrane 3 - 19 ( 3-19) 

Final Results 

bacterial membrane --- Certainty-O. 1595 (Affirmative) < suco 

bacterial outside --- Certainty-0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty-0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB06422 GB.-AP001516 unknown conserved protein [Bacillus halodurans] 
Identities - 133/315 (42%), Positives - 191/315 (60%), Gaps - 4/315 (1%) 

Query: 1 MKLAVLGTGMIVKEVLPVLQKI EG I DLVAI LSTVRSLETAKDLAKEYNMSLATSEYKAVL 60 

MK+A +GTC IV+ L L I+G VA+ S R TAX LA +YN+ ♦ + +L 
Sbjct: 1 MKIATVGTGPIVEAFLSALDD1DGPMCVAMYS- -RKETTAKPLADQVNIPTIYTHFDHML 58 

Query: 61 DNEE1 DTVY IGLPNHLHFDYAKEALLAGKHVI CEKPFTLEASQLEELVS I ANTRQLI LLE 120 

4 ++ VY+ PN LH+ +A +AL KHVICEKPFT A +LE L+S+A +L+L E 
Sbjct: 59 ADPNVEWYVASPNSIittQHALQALEHRKHVICEKPFTC^ 118 

Query: 121 AiraQYLPNFDLVKEHl^NIXSDIKIVECNYSQYSSRYDAFKRGEIAPAFNPEMGGGALRD 180 
AIT 4LPN+ L+KE++ LG IK+4+CNYSQYSSRYD F GE FNP GOAL D 

Query: 181 LNIYNI^LVIGLFGEPITAQYLPNIE-RGIDTSGVLVLDYGHFICIVCIGAKIX^SAEVKST 239 

+N+YN+H V+ LPG P A Y+ N GIDTSGVLVL Y HF + C+G KD + 
Sbjct: 179 INVYNIHFVM^FGPPEAAHYIANQHANGIDT£G\^VLKYPHFISECVGCKr/TCSMNF^ 238 

Query: 240 IQGDKGSIAILGPThm4PKISLTMNGQESHVYQUX3DRHP>™EF/IFEGIIS>^FKRA 299 

IQG+KG 1+ N + + ++QS+ D 4+ 4E4 4F44 

Sbjct: 239 I C^EKGYI HVENGANGCRNVKI YLDDQTSELNAQTNWILLYYETRTFYE - MYQAKNFEKC 297 

Query: 300 AQALEHSRTVMKVLD 314 

4 L 4S 4VM4V44 
Sbjct: 298 YELLSYSHSVKRVKE 312 

A related DNA sequence was identified in S.pyogenes <SEQ ID 719> which encodes the amino acid 
sequence <SEQ ID 720>. Analysis of this protein sequence reveals the following: 
Possible site: 40 

>» Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside --- Certainty-0. 3000 (Affirmative) < suco 

bacterial membrane — Certainty-0. 0000 (Not Clear) « suco 

bacterial cytoplasm --- Certainty-0.0000(Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 
Identities - 233/314 (74%), Positives - 269/314 (85%) 

Query: 1 MKLAVLGTCJMI VKEVLPVLQKI EG I DLVA I LSTVRSLETAKDLAKEYNMSLATSEYKAVL 60 
MKIAVl^TGAilVKEVLPVLQKI+GIDLVAII^TVRSL TAKDLAK + +M LATS+Y+A+L 
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Sbjct: 1 MKI^VLGTGMIV!CEVLPVLQKIDGIDLVA:LSTVRSLTTAKDLAKAHHMPLATSKYEft.IL 60 

Query: 61 DNEEIDTWIGLPNHLHFDYAKEALLAGKHVICEKPFTLEASQLEELVSIANTRQLILLE 120 

NEEIDTVYIGLPNHLHF YAKEALLAGKHVI CEKPFT+ A +L+ELV IA R+LILLE 
Sbjct: 61 GNEEIDTVYIGLPNHLHFAYAKEALLAGKtr/ICEKPFTMTAGELDELWIARI<RKLILLE 120 





121 


AITNQYLPNFDLVKEHLSNLGDIKIVECNYSQYSSRYDAFKEGEIAPAFNPEMGGGALRD 


180 






AITNQYL N +KEHL LGDIKIVECNYSQY3SRYDAFKRG+IAPAFNP+MGGGALRD 




Sbjct: 


121 


AITNQYLSMTFIKEHLDQLGDIKIVECNYSQYSSRYEAFKRGDIAPAFNPKMGGGALRD 


ISO 


Query: 


181 


LNIYNLHLVIGLFGEPITAQYLPNIERGIDTSGVLVLDYGHFKTVCIGAKDCSAEVKSTI 


240 






OTYN+H V+GLFG P T QYL N+B+GIDTSG+LV+DY FK VCIGAKDC+AE+KSTI 




Sbjct: 


181 


LNIYNIHFWGLFGRPKTVQYLANVEKGIDTSGMLVMDYEQFKWCIGAKDCTAEIKSTI 


240 


Query: 


241 


QGDKGSIAILGPTNTMPKISLTMNGQESHVYQLNGDRHRMHDEFVIFEGIISNLDFKRAA 


300 






QG+KGS+A+LG TNT+P++ L+++G E V N HRM++EFV F +1 DF++ 




Sbjct: 


241 


QGNKGSIAVLGATNTLPQVQLSLHGHEPQVINHNKHDHRMYEEFVAFRDMIDQRDFEKVN 


3C0 




301 


QALEHSRTVMKVLD 314 








CALEHSR VM VL+ 




Sbjct: 


301 


QALEHSRAVMAVLE 314 





SEQ ID 2266 (GBS342) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 72 (lane 10; MW 36.6kDa). It was also expressed in E.coli as a GST-fusion 
25 product. SDS-PAGE analysis of total cell extract is shown in Figure 81 (lane 2; MW 61kDa). 

GBS342-GST was purified as shown in Figure 226, lane 3. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 740 

30 A DNA sequence (GBSx0786) was identified in S.agalactiae <SEQ ID 2267> which encodes the amino 
acid sequence <SEQ ID 2268>. Analysis of this protein sequence reveals the following: 

Possible site: 19 

>» Seems to have no N-terminal signal sequence 

35 ----- Final Results 

bacterial cytoplasm Certainty=0 . 0499 (Affirmative) < suco 

bacterial membrane --- Certainty=0 .0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



40 The protein has homology with the following sequences in the GENPEPT database: 



^ MI IG V +YV + + + FW KVGF+ G +++ VAPK +E V++ K 

Sbjct: 1 MIKQIGWAVYVEDQQKAKQFWTEKVGFDIARDHPMGPEASWLEVAPK-GAETRLVIYPK 59 

Query: 60 AI IAQMSPELDLATPSILFETTDIDSTYQELTAN- -EVMTNP- 1 VDMGSMRVFNFSDNDN 116 

A M + SI+FE DI TY+++ NE+P++G+ FDD 
Sbjct: 60 A MMKGSEQMKASIVFECEDIFGTYEKMiCTNGVEFLGEPNQMEWGTF- -VQFKDEDG 113 

Query: 117 NYFAIRE 123 

N F ++E 
Sbjct: 114 NVFLLKE 120 

No corresponding DNA sequence was identified in S.pyogenes. 



WO 02/34771 



-837- 



PCT/GB01/04789 



Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 741 

A DNA sequence (GBSx0787) was identified in S.agalactiae <SEQ ID 2269> which encodes the amino 
5 acid sequence <SEQ ID 2270>. Analysis of this protein sequence reveals the following: 

Possible site: 37 

>>> Seems to have no N-terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certainty=0 .3402 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB04569 GB:AP001510 unknown conserved protein in others 
[Bacillus halodurans] 
Identities = 46/144 (31%) , Positives = 83/144 (56%) , Gaps = 10/144 (6%) 

Query: 1 WKALEIYIVTNGNGRQAVDFYKDVFQADLWMfflTOEM- - DPNC- -LEDRKDLI INAQL 56 

M+ + Y++ 4G+G+ A++FY+D A+++ + T+ ++ PN KDLI++A L 

Sbjct: 1 MILTMNPYLMLDGDGQAAIEFYQDALNAEVITIQTYGDLPEQPNSPMASWKDLILHAHL 60 

Query: 57 IFDGIRLQISDENPD FVYQAGKNVTAAIIVGSVEEAREIYEKLKKSAQEVQLELQ 111 

+ L ISD+ D F +G VT A+ +VE E+++KL +E+ L+ 

Sbjct: 61 KLGEMDMISDQCLDVDPERFFQHSGSPVTIALTTNNVEMTTEVFQKLASGGEEIA-PLE 119 



Query: 112 ETFWSPAYANL VDQFGVMWQI STE 135 

+TF+SP Y + D+FG+ W +ST+ 
Sbjct: 120 KTFFSPLYGQVTDKFGITWHVSTQ 143 



No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 



Example 742 

35 A DNA sequence (GBSx0788) was identified in S.agalactiae <SEQ ID 2271> which encodes the amino 
acid sequence <SEQ ID 2272>. Analysis of this protein sequence reveals the following: 

Possible site: 42 

>>> Seems to have no N-terminal signal sequence 

40 Final Results 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



45 The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB03784 GB:AP001507 UDP-N-acetylglucosamine pyrophosphorylase 
[Bacillus halodurans] 
Identities = 238/453 (52%), Positives = 322/453 (70%), Gaps = 1/453 (0%) 

50 Query: 1 MSN-YAIIIAAGKGTRMKSDBPKVimKVSGITMLEHVTRSVQAIEESKIVTVIGHKAELV 59 

MSN +A+ILAAG+GTRMKS L KV+H V G M++HV V A+ +IVT+IGH A+ V 
Sbjct: 1 MSNRFAVIIJ^GCGTRMKSKLYKVLHSVCGKP^WQHVvDQVSALGFDEIVTIIGHGADAV 60 



Query: 60 RDVLGDKSEFVMQTEQLGTGHAVMMAEEELATSKGHTLVIAGDTPLITGESLKNLIDFHV 119 
55 + LG++ + +Q EQLGTGHAV+ AE h +G T+V+ GDTPL+T E++ +++ +H 
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Sbjct: 61 KSQLGERVSYALQEEQLGTGHAVLQAESALGGRRG'/TIVLCGDTPLLTAETIDHVMSYHE 120 

Query: 120 NHKimTILTADAANPFGYGRIIRNSDDEOTKIVEQKDANDFEQQVKEINTGTyVFDNQS 179 

+ AT+LTA+ A+P GYGRI+RN V +IVE KDA E+Q+ E+NTGTY FDN++ 
Sbjct: 121 EEQAKATVLTAELADPTGYGRIVRNDKGLVERIVEHKDATSEEKQITEVNTGTYCFDNEA 180 

Query: 180 LFFALKDimmAQGEYYLTDVlGIFKEaGKKVGAYKIiRDFDESLGTODRVALATAEKVM 239 

LF+ALK++ NNAQGEYYL DVI I + G4KV AYK +E+LGVNDRVALA AE+VM 
Sbjct: 181 LFQALKEVGN^AQGEYYLPDVIQILQTKGEKVAAYKTAHVEETLGVNDRVALAQAEQVM 240 

Query: 240 RHRIARQHMVHGVTWNPDSAYIDI DVE IGEESVTE PMVTLKGQTKIGKGTLL1HGSYLV 299 

+ RI M GVT ++P+ Y+ D IG+++VT P + GQT IG+G +L 4 L 
Sbjct: 241 KRRINEAVmRKGVTFIDPEQTYVSPDATIGQDTVIYPGT^WLGQTTIGEGCVLGPHTELK 300 

Query: 300 DAQVGNDVTITNSMVEESIISDGVTVGPYAHIRPGTSLAKGVHIGNFVEVKGSQIGENTK 359 

D+++GN + S+V S + + V++GP++HIRP + + V IGNFVEVK S IG+ +K 
Sbjct: 3 01 DSKIGNKTAVKQSWHNSEVGERVSIGPFSHIRPASMIHDDVRIGNFVEVKKSTIGKESK 3 60 

Query: 360 AGHLTYIG1IAEVGCDVNFGAGTITVNYDGQNKFKTEIGSNVFIGSNSTLXAPLEIGDNAL 419 

A HL+YIG+AEVG VNF G+ITVNYDG+NKF T+I + FIG NS LIAP+ IG AL 
Sbjct: 351 ASHLSYIGDAEVGERVNFSCGSITVNYDGKNKFLTKrEDDflFIGCMSMLIAPUTIGKjGAL 420 

Query: 420 TAAGSTITDNVPIDSIAIGRGRQVNKEGYANKK 452 

AAGSTIT++VP D+++I R RQ NKE Y KK 
Sbjct: 421 IAAGSTITEDVPSDALSIARARQTNKEHYVTKK 453 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2273> which encodes the amino acid 
sequence <SEQ ID 2274>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 0461 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) •< suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 345/458 (75%) , Positives = 398/458 (86%) 
Query: 
Sbjct: 

Query: 61 DVLGDKSEFVMQTEQLGTGHAV^IMAEEEIATSKGHTLVIAGDTPLITGESLKNLIDFHVN 120 

VL D+S FV QTEQLGTGHAVMMAE +L +GHTbVIAGDTPJjITGESLK+LIDFHVN 
Sbjct: 61 AVIADQSAFVHQTEQLGTGHAVMMAETQLEGLEGHTLVIAGDTPLITGESLKSLIDFHVN 120 

Query: 121 HKNVATILTADAANPFGYGRIIRWSDDEVTKIVEQKDANDFEQQVKEINTGTyVFDNQSL 180 

HKNVATILTA A +PFGYGRI+RN D EV KIVEQKDAN++EQQ+KEINTGTYVFDN+ L 
Sbjct: 121 HKOTATILTATAQDPFGYGRIVRNKTJGEVIKIVEQKDA^YEQQLKEINTGTYVFDNKRL 180 

Query: 181 FEMKDINimAQGEYYLTDVIGIETKEftGKKVGAYKIiRDFDESLGVNDRVAIATAEKVMR 240 

FEALK I TNNAQGEYYLTDV+ IF+ +KVGAY LRDF+ESLGVNDRVALA AE VMR 
Sbjct: 181 FFALKCITTNNAQGEYYLTDVVAIFRANKEKVGAYIIjRDFNESLGVNDRVAIAIAETV^ 240 

Query: 241 HRIARQH^IvNGVTvA/NPDSAYIDIDVEIGEESVIEPNVTLKGQTKIGKGTLLTNGSYLVD 300 

RI ++HMVNGVT NP++ YI+ DVEI + +IE NVTLKG+T IG GT+LTNG+Y+VD 
Sbjct: 241 QRITQKHIWNGVTFQNPETVYIESDVEIAPDVLIEGNOTLKGRTHIGSGTVLTNGTYIVD 300 

Query: 301 AQVGNDVTITNSMvEESIISDGVTOGPYAHIRPGTSLAKGVHIGNFVEVKGSQIGENTKA 360 

+++G++ +TNSM+E S+++ GVTVGPYAH+RPGT+ti + VHIGNFVEVKGS IGE TKA 
Sbjct: 301 SEIGDNCVVTNSMIESSVLAAGVTVGPYAHLRPGTTLDREVHIGNFVEVKGSHIGEKTKA 360 



Query: 361 GHLTYIGNAEVGCDVNFGAGTITVNYDGQNKFKTEIGSNVFIGSNSTLIAPLEIGDNALT 420 
65 GHLTYIGNA+VG VN GAGTITVNYDGQNK++T IG + FIGSNSTLIAPLE+GD+ALT 
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Sbjct: 361 GHLTYIGNAQVGSSWVGAGTITVKYDGQNKYETVIGDHkFIGSNSTLIAPLEVSDHALT 420 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 743 

A DNA sequence (GBSx0790) was identified in S.agalactiae <SEQ ID 2275> which encodes the amino 
acid sequence <SEQ ID 2276>. Analysis of this protein sequence reveals the following: 

Possible site: 52 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1366 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



20 The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB14293 GB:Z99116 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 92/177 (51%), Positives = 124/177 (69%), Gaps = 4/177 (2%) 

Query: 4 EEKTINRQTVFDGQIIKVATODVBLPNGLGQSKRELVFHGGAVATLAVTPEHKIVLVKQY 63 
25 , EEKTI ++ +F G++I + V+DVELPNG SKRE+V H GAVA LAVT E KI++VKQ+ ■ 

Sbjct: 5 EEKTIAKE0IFSGKVIDLYVBDVELPNGKA-SKREIVKHPGAVAV1AVTDEGKIIMVKQF 63 

Query: 64 RKAIEGISYEIPAGKLETGESGSKEEAALRELEEETGTTG-NLEILYSFYTAIGFCNEKI 122 
RK +E EIPAGKLE GE E ALRELEEETGYT L + +FYT+ GF +E + 
30 Sbjct: 64 RKPLERTIVEIPAGKLEKGE— EPEYTALRELEEETGYTAKKLTKITAFYTSPGFADEIV 121 



Query: 123 VLYLATDLQKVENPRPQDDDEVLEBLELSYEDCMQMVEKGMIQDAKTIIALQYYGrjK 179 

++LA +L +E R D+DE +E++E++ ED +++VE + DAKI A+QY LK 
Sbjct: 122 WFLAEELSVLEEKRELDEDEFVEVMEVTDEDALKL^SREVYDAKTAYAIQYLQLK 178 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2277> which encodes the amino acid 
sequence <SEQ ID 2278>. Analysis of this protein sequence reveals the following: 

Possible site: 50 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1120 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 136/182 (74%) , Positives = 153/182 (83%) 



Query: 1 MDFEEKTIIfflQTVFDGQIIKVAVDDTOLPNGLGQSKRELVFHGGAVATLAvTPEHKIVLV 60 

M FEEKI+ RQTVFDG I KV VDDVELPN LC-QSKREL+FH GAVA LA+TPE KIVLV 
Sbjct: 1 MKFEEKTLKRQTVFDGHIFKVVVDDVELPWNLGQSKRELIFHRGAVAVDAITPERKIVLV 60 

Query: 61 KQYRKAIEGISYEIPAGKLETGESGSKEEAALRELEEETGYTGNLEILYSFYTAIGFCNE 120 

KQYRKAIE +SYEIPAGKLE GE GSK +AA REIiEEET YTG L LY FYTAIGFCNE 
Sbjct: 61 KQYRKAIERVSYEIPAGKLEIGEEGSKLKAAARELEEETAYTGTLTFLYEFYTAIGFCNE 120 

Query: 121 KIvTJYMTDLQKVENPRPQDDDEVlEIiELSYEDCMQMVEKGMIQDAKTIIALQYYGLKM 180 

KI L+LATDL +V NP+PQDDDEV+E+LEL+Y++CM +V +G + DAKT+IALQYY L 
Sbjct: 121 KITLFLATDLIQVANPKPQDDDEVIEVLELTYQECMDLVAQGPCLADAKTLIALQYYALHF 180 
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Query: 181 GG 182 
GG 

Sbjct: 181 GG 182 

5 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 744 

A DNA sequence (GBSx0791) was identified in S.agalactiae <SEQ ID 2279> which encodes the amino 
10 acid sequence <SEQ ID 2280>. Analysis of this protein sequence reveals the following: 

Possible site: 16 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood =-15.44 Transmembrane 70 - 86 ( 64 - 88) 

15 Final Results 

bacterial membrane --- Certainty=0. 7177 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

20 The protein has no significant homology with any sequences in the GENPEPT database. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2281> which encodes the amino acid 
sequence <SEQ ID 2282>. Analysis of this protein sequence reveals the following: 
Possible site: 35 

»> Seems to have no N-terminal signal sequence 
25 INTEGRAL Likelihood =-15.60 Transmembrane 65 - 81 ( 58 - 83) 

Final Results 

bacterial membrane — Certainty=0. 7241 (Affirmative) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

30 bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 39/83 (43%) , Positives = 61/89 (67%) , Gaps = 6/89 (6%) 

35 Query: 1 MGKPLLTDDMIERSNRGEKVSGQTILDQETKIISTEDGMEQLTDENGKHIYKSRRIENAK 60 
MG+PLLTDD+ IE++ RE ++ +TK+++ + ++ IYKSRRIENAK 

Sbjct: 2 MGRPLLTDDIIEKARRMETFEPDDAVNFDTKVMTLPE KDDKAR1YKSRRIENAK 55 

Query: 61 RNEFQRKLNLVLFILLILLALLFYAIFKL 89 
40 R++ Q KLN++L +++L+A+L YAIF L 

Sbjct: 56 RSQLQSKLNVILIAVMLLIAILVYAIFYL 84 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

45 Example 745 

A DNA sequence (GBSx0792) was identified in S.agalactiae <SEQ ID 2283> which encodes the amino 

acid sequence <SEQ ID 2284>. This protein is predicted to be pfs protein (pfs). Analysis of this protein 

sequence reveals the following: 

Possible site: 55 
50 »> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.32 Transmembrane 56 - 72 ( 56 - 72) 

Final Results 
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bacterial membrane Certainty=0 . 1128 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

5 The protein has homology with the following sequences in the GENPEPT database: 

^GP:AAC22869 GB:U32801 pfs protein (pfs) [Haemophilus influenzae Rd] 
Identities = 100/229 (43%) , Positives = 144/229 (62%) 

MKIGIIAAMEEELKLLVBNLEDKSQETViSNVYySGRYGEHELvIiVQSGVGKVMSAMSVA 6 0 
MKIGI+ AM +E+++L + D+++ V S V + G+ ++ L+QSG+GKV +A+ 
MKIGIVGAMAQEVEILKNLMADRTETRVASAVI FEGKINGKDVALLQSGIGKVAAAIGTT 6 0 



Query: 


1 


Sbjct: 


1 




61 


Sb j ct : 






121 


Sbjct: 


121 




•181 


Sb j Ct : 


181 



K GLI +GDSFI ++KI IK FP V VEME A1AQ 



PFVWRA4-SD A++4-F+EF+ A K+S+ + 



A related DNA sequence was identified in S.pyogenes <SEQ ID 2285> which encodes the amino acid 
sequence <SEQ ID 2286>. Analysis of this protein sequence reveals the following: 

Possible site: 23 

>» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm — Certainty=0. 1245 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 169/229 (73%) , Positives = 189/229 (81%) 

Query: 1 MKIGIIAAMEEELKLLvENLEDKSQETVTLSNVYYSGRYGEHELvlVQSGVGIOTMSAMSVA 60 

MKIGI IAAMEEEL LL+ NL D + VLS YY+GR+G+HEL+LVQSGVGKVMSAM+VA 
Sbjct: 1 MKIGIIAAMEEELSLLLANLLDAQEHQVLSKTYYTGRFGKHELILVQSGVGKVMSAMTVA 60 

Query: 61 ILVESFIWDAIINTGSAGAVATGLNVGDVVVADTLVYHDVDLTAFGYDYGQMSMQPLYFH 120 

ILVE FK AIINTGSAGAVA+ L +GDWVAD LVYHDVD TAFGY YGQM+ QPLY+ 
Sbjct: 61 ILVEHFKAQAIINTGSAGAVASHLAIGDWVADRLVYHDVDATAFGYAYGQMAGQPLYYD 120 

Query: 121 SDKTFVSTFEAVLSKEEMISKVGLIATGDSFIAGQEKIDVIKGHFPQVLAVEMEGAAIAQ 180 

D FV+ F+ VL E+ +VGLIATGDSF+AGQ+KID IK F VLAVEMEGAAIAQ 
Sbjct: 121 CDPQWAIFKQVLKHEKIWGQVGLIATGDEFVAGQDKIDQIKTAFSDVLAVEMEGAAIAQ 180 

Query: 181 AaCATGKPFWVRAMSDTAAHDANITFDEFIlEAGKRSAQVLMAFLKAL 229 

AA GKPF+WRAMSDTAAHDANITFD+FIIEAGKRSAQ LM FL+ L 
Sbjct: 181 AAHTAGKPFIWRAMSDTAAHDANITFDQFIIEAGKRSAQTLMTFLENL 229 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 746 

A DNA sequence (GBSx0793) was identified in S.agalactiae <SEQ ID 2287> which encodes the amino 
acid sequence <SEQ ID 2288>. This protein is predicted to be SloR. Analysis of this protein sequence 
reveals the following: 
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3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .3777 (Affirmative) •= suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9405> which encodes amino acid sequence <SEQ ID 9406> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 



Query: 1 MSEMIKKMISEQLIVKDKDLGYYLTKQGLLWSDLYRKHRLVEVFLVNHLHYTM3DIHEE 60 

+SEM+KK++ E L++KDK GY LTK+G +4- S LYRKHRL+EVFL+NHL+YTAD+IHEE 
Sbjct: 38 VSEMVKKLLLEDLVLKDKQAGYLLTKKGQILASSLYRKHRLIEVFLMNHLNYTADEIHEE 97 

Query: 61 AEVLEHTVSTTFVDQLEKLLDFPQFCPHGGTIPKKGEFLVEINQMTLDQISQLGTYVISR 120 

AEVLEHTVS FV++L+K L++P+ CPHGGTIP+ G+ LVE + TL 4+++G Y++ R 
Sbjct: 98 AEVLEHTVSDVFVERLDKFLNYPKVCPHGGTIPQHGQPLVERYRTTLKGVTEMGVYLLKR 157 

Query: 121 VHDDFQLLKYLEQHRLHINDTIELTQIDPYAKrYHITYMDENLTIPERIASQIYV 175 

V D+FQLLKY+EQH LID + L + D+A Yl +EL+ +ASQIY+ 
Sbjct: 158 VQDNFQLLKYMEQHHLKIGDELRLLEYDAFAGAYT1EKDGEQLQVTSAVASQIY1 212 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2289> which encodes the amino acid 
sequence <SEQ ID 2290>. Analysis of this protein sequence reveals the following: 

d N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2910 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 44/75 (58%), Positives = 59/75 (78%) 

Query: 1 MSEMIKKMISEQLIVKDKDLGYYLTKQGLLWSDLYRKHRLVEVFLVNHLHYTADDIHEE 60 

+SEMIKKMIS+ IVKDK GY L +G +V++LYRK RL4-EVFL++ L Y ++H+E 
Sbjct: 38 VSEMIKKMISQGWIVraKAKGYLLKTKGYALVANLYRKLRLIEVFLIHQLGYNTQEVHQE 97 

Query: 61 AEVLEHTVSTTFvDQ 75 

AEVLEHTVS +F+D+ 
Sbjct: 98 AEVLEHTVSDSFIDR 112 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 747 

A DNA sequence (GBSx0794) was identified in S.agalactiae <SEQ ID 229 1> which encodes the amino 
acid sequence <SEQ ID 2292>. This protein is predicted to be undecaprenyl pyrophosphate synthetase 
(uppS). Analysis of this protein sequence reveals the following: 
Possible site: 46 

>•» Seems to have no N-terminal signal sequence 
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Final Results 

bacterial cytoplasm Certainty=0 .3569 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9435> which encodes amino acid sequence <SEQ ID 9436> 
was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 



Query: 1 I^PWFFDKWPELDKNNTOVQVIGDTHKLPKATYDA>lQRACLRTKHNSGLVIiNFArOT 60 

M LP +F + Y+PEL + NV+V++IGD LP T A+++A T N G++LNFALNY 

Sbjct: 100 MKLPEEFLNTyLPELVEEOTQTOIIGDETALPAHTLRAIEKAVQDTAQNDGMIMFALNY 159 

Query: 61 GGRSEITNAIKEIAQDVLEAKLNPDD1TEDLVANHLMTNSLPYLYRDPDLIIRTSGELRL 120 

GGR+EI +A K +A+ V E LN +DI E L + +LMT SL +DP+L+IRTSGE+RL 

Sbjct: 160 GGRTEIVSAAKSLAEKVKEGSLNIEDIDESLFSTYLMTESL QDPELLIRTSGEIRL 215 



A related DNA sequence was identified in S.pyogenes <SEQ ID 2293> which encodes the amino acid 
sequence <SEQ ID 2294>. Analysis of this protein sequence reveals the following: 

Possible site: 57 

>>> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .2073 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 125/165 (75%) , Positives = 145/165 (87%) 

Query: 1 MNLPVKFFDKTVPELDKNNTOVQVIGDTHKLPKATYDAMQRACLRTKHNSGLVIjNFAIjOT 60 

MNLPV FFDKYVP L +NNV++Q+IG+T +LP+ T A+ A +TK N+GL+LNFALNY 
Sbjct: 85 MNLPVTFFDKYVPVLHENNVKIQMIGETSRLPEDTLAAI^AAIDKTKPJWGLIMFAIJTO 144 

Query: 61 GGRSEITNAIKEIAQDVLEAKLNPDDITEDLVANHLMTNSLPYLYRDPDLIIRTSGELRL 120 

GGR+EIT+A++ IAQDVL+AKLNP DITEDL+AN+LMT+ LPYLYRDPDLI IRTSGELRL 
Sbjct: 145 GGRAEITSAVRFIAQDVLDAKLNPGDITEDLIANYLMTDHLPYLYRDPDLIIRTSGELRL 204 

Query: 121 SNFLPWQSAYSEFYFTPVLWPDFKKDELHKAIVDYNQRHRRFGSV 165 

SNFLPWQSAYSEFYFTPVLWPDFKK EL KAI DYN+R RRFG V 
Sbjct: 205 SNFLPWQSAYSEFYFTPVLWPDFKKAELLKAIADYNRRQRRFGKV 249 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 748 

A DNA sequence (GBSx0795) was identified in S.agalactiae <SEQ ID 2295> which encodes the amino 
acid sequence <SEQ ID 2296>. This protein is predicted to be phosphatidate cytidylyltransferase (cdsA). 
Analysis of this protein sequence reveals the following: 

Possible site: 22 

»> Seems to have a cleavable N-term signal seq. 
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INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



Likelihood = • 

Likelihood = ■ 

Likelihood = • 

Likelihood = • 

Likelihood = ■ 

Likelihood = - 

Likelihood = ■ 



Transmembrane 
Transmembrane 
Transmembrane 



- Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



194 - 222! 
170 - 197; 



-- Certainty=0. 4461 (Affirmative) • 
Certainty=0. 0000 {Not Clear) < 1 
-- Certainty=0. 0000 (Not Clear) < 1 



The protein has homology with the following sequences in the GENPEPT database: 

>GP:BAB06141 GB:AP001515 phosphatidate cytidylyltransf erase 
[Bacillus halodurans] 
Identities = 116/266 (43%) , Positives = 172/266 (64%) , Gaps = 6/266 (2%) 





1 


Sbjct: 


1 


Query: 


61 


Sb j ct : 


61 






Sbjct: 


121 


Query: 


178 


Sbjct: 


181 


Query: 


238 


Sbjct: 


238 



+F+ F+V+GGLPF + ++A I +SELL+M+++ 



TVL N+++F++A F I SS Y+G G 



FQNLVSARMA- - -GIDKVLLALFIVWATDIGAYMIGRQFGQRKLLPSVSPNKTIEGSLGG 177 
F L+ +R G+ V LF++WATD GAY GR FG+ KL P +SPNKTIEGS+GG 

[KLWPHISPNKTIEGSIGG 180 



^ S+FGQ GDLVES++KRH+ VKDSG ■) 



+ PGHGGILDRFDS+I+V PI+H 



A related DNA sequence was identified in S.pyogenes <SEQ ID 2297> which encodes the amino acid 
sequence <SEQ ID 2298>. Analysis of this protein sequence reveals the following: 



Possible site: 61 
eems to have an uncleavable 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 
Likelihood = 



INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 
INTEGRAL 



[-term signal seq 
Transmembrane 1 
Transmembrane 
Transmembrane 2 
Transmembrane 
Transmembrane 
Transmembrane 1 



197 - 222: 



135 - 153 



- Certainty=0. 4991 (Affirmative) ■ 
bacterial outside --- Certainty=0 . 0000 (Not Clear) < i 
bacterial cytoplasm Certainty=0. 0000 (Not Clear) < 1 



The protein has homology with the following sequences in the databases: 

>GP:BAB06141 GB:AP001515 phosphatidate cytidylyltransf erase 
[Bacillus halodurans] 
Identities = 125/266 (46%) , Positives = 177/266 (65%) , Gaps = 6/266 (2%) 

Query: 1 MKERWWGGVAVAIFLPFLIIGNLPFQLFVGVLAMIGVSELLKMKRLEVFSFEGVFAMLA 60 

MK+RW + +FL F+++G LPF +F+ V+A I +SELLKMK++ FS G F++L 
Sbjct: 1 MKQRWTAIIFGLVFLTFWVGGLPFTMFIIWATIAMSELLKMKKIAPFSPMGAFSLLP 60 
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Query: 


61 


Sb j ct : 


61 


Query: 


121 


Sbjct: 


121 


Query: 


178 


Sbjct: 


181 




238 


Sbjct: 





I AV++ +F I +++ + +++VA S+P Q GDLVESALKRH+ VKDSG 4 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 204/264 (77%) , Positives = 243/264 (91%) 
Query: 
Sbj< 



Query: 
Sbj< 

Sbji 

Sbjct 
Query: 

Sbji 



MKERVIWGAVMAIFIPFLVMGGLPFQFLVC3LIANIGVSELLRMRRLEIFSFEGALAMIG 6 0 
MKERV+WG VA4AIF+PFL++G LPFQ VG+LAMIGVSELL+M+RLE+FSFEG AM4 
MKERWWGGVAVAIFLPFLIIGNLPFQLFVGVIAMIGVSELLKMKRLEVFSFEGVFAMLA 60 



181 AIVVAFFFMLFDKTWAPHSFLVMLVLVAIFSIFGQFGDLVESSIKRHFGVKDSGKLIPG 240 

A++V+F FM+ D++VYAPH FL MLVLVA+FSIF QFGDLVES++KRHFGVKDSGKLIPG 
181 AVLVSFIFMVIDRSVYAPHHFLTMLVLVALFSIFAQFGDLVESALKRHFGVKDSGKLIPG 240 

241 HGGILDRFDSMIFVFPIMHFFGLF 264 

HGGILDRFDSMI FVFP1MH FGLF 
241 HGGILDRFDSMIFVFPIMHLFGLF 264 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 749 

A DNA sequence (GBSx0796) was identified in S.agalactiae <SEQ ID 2299> which encodes the amino 
45 acid sequence <SEQ ID 2300>. Analysis of this protein sequence reveals the following: 

Possible site: 46 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-11.09 Transmembrane 2 - 18 ( 1-25) 

INTEGRAL Likelihood = -9.39 Transmembrane 394 - 410 ( 390 - 415) 

50 INTEGRAL Likelihood = -8.01 Transmembrane 181 - 197 ( 173 - 198) 

INTEGRAL Likelihood = -2.S7 Transmembrane 343 - 359 ( 342 - 360) 

Final Results 

bacterial membrane Certainty=0. 5437 (Affirmative) < suco 

55 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) ■= suco 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:AAD47948 GB:AF152237 Eep [Enterococcus faecalis] 
60 Identities = 229/425 (53%) , Positives = 298/425 (69%) , Gaps = 9/425 (2%) 
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1 


Sbjct: 


1 


Query: 


61 


Sbjct: 


61 


Query: 


121 


Sbjct: 


121 


Query: 


178 


Sbjct: 


181 


Query: 


237 


Sbjct: 


241 


Query: 


295 


Sbjct: 


2S8 


Query: 


355 


Sbjct: 


358 




415 


Sbjct: 


418 



MLGILTFIIIFGVIWVHEFGHFYFAKKSGILWEFAIGMGPKIFSHIDKEGTTYTIRIL 60 
M I+TFII+FG+ + V+VHEFGHFYFAK+ +GI LVREFAIGMGPKI F+H K+GTTYTIR+L 

MKTIITFIIVFGILVLVHEFGHFYFAKRAGILVREFAIGMGPKIFAHRGKDGTTYTIRLL 60 

PLGGYVPJIAGWGDDKTEIKTGTPASLTIHKEGIWRINLSGKQLDHTSLPIKTVTAYDLED 120 
P+GGYVRMAG G+D TEI G P S+ LN G V +IN S K S+P+ V +DLE 

PIGGYVRMAGMGEDMTEITPGMPLS'\"E I I "i' i 'TSKKVQLPHSIPMEWDEDLEK 120 



NNFILG ++F F+QGGV DL+TNQ+ +V NGPAA AGLK ND++L I + K+ +E 



L L FS+NXLGGPV 4 



+NLGI+NL+PIPALDGGKIV+NI+E +R KP+ 



- S 4-A+ G TV+ LM ++S 



+ +VLM+ VTWNDI 



A related DNA sequence was identified in S.pyogenes <SEQ ID 2301> which encodes the amino acid 
sequence <SEQ ID 2302>. Analysis of this protein sequence reveals the following: 



Possible site: 26 
•> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood =-11.41 Transmembrane 2 ■ 

INTEGRAL Likelihood = -9.77 Transmembrane 394 ■ 

INTEGRAL Likelihood = -9.61 Transmembrane 180 - 

INTEGRAL Likelihood = -2.66 Transmembrane 347 ■ 



18 < 1 - 25) 

410 ( 390 - 415) 

196 ( 173 - 201) 

363 ( 343 - 363) 



■ Final Results 

bacterial membrane Certainty=0 . 55S4 (Affirmative) < 

bacterial outside Certainty=0 . 0000 (Not Clear) < e 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < s 



The protein has homology with the following sequences in the databases: 



Query: 
Sbjct: 



MLGIITFIIIFGILVIVHEFGHFYFAKKSGILVREFAIGMGPKIFSHVDQGGTLYTLRML 60 
M IITFII+FGILV+VHEFGHFYFAK++GILVREFAIGMGPKIF+H + GT YT+R+L 
MKTIITFIIVFGILVLVHEFGHFYFAKRAGILVREFAIGMGPKIFAHRGKDGTTYTIRLL 60 



Query: 61 PLGGYVFmGWGDDKTEIKTGTPASLTLNEQGFVKRINLSQSKLDPTSLPMHVTGYDLED 120 

P+GGYVRMAG G+D TEI G P S+ LN G V +IN S+ P S+PM V +DLE 

Sbjct: 61 PIGGYVFmGMGEDMTEITPGMPLSVELNAVGNVVKINTSKKVQLPHSIPMEVVDFDLEK 120 

Query: 121 QLSITGLV LEETKTYKVAHDATIVEEDGTEIRIAPLDVQYQNASIGGRLITNFAGPM 177 

+L I G V EE YKV HDATI+E DGTE+RIAPLDVQ+Q+A + R++TNFAGPM 

Sbjct: 121 ELFIKGYVNGNEEEETVYKVDHDATIIESDGTEVRIAPLDVQFQSAKLSQRILTNFAGPM 180 

Query: 178 NNFILGIWFILLWLQGGMPDFSSNHV-RVQENGAAAKAGLRDNDQIVAINGYKVTSWN 236 

NNFILG ++F L VFLQGG+ D ++N + +V NG AA+AGL++ND++++LN K+ + 

Sbjct: 181 NNFILGFILFTIAWLCjGGVTDLinHQIGQVIPNGPAi^EAGLKENDKVLSINNQKlKKYE 240 
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Query: 237 DLTEAVDLATRDLGPSQTIKVTYKSHQRLKTVAVKPQKH-AKTYTI GVKASLKTGFK 292 

DTV P + + + + + + + V P+K + TI GV +KT 

Sbjct: 241 DFTTIV QKNPEKPLTFVVERNGKEEQLTVTPEBGQKVEKQTIGKVGVYPYMfCTDLP 295 

Query: 293 DKLLGGLEIAWSRAFTIIoNALKGLITGFSLNKLGGPVAimJMSNQAAQNGLESVLSLMAM 352 

KL+GG++ + I AL L TGFSIMKLGGPV M+ +S +A+ G+ +V+ LMAM 
Sbjct: 296 SKLMGGIQDTLNSTTQIFKALGSLFTGFSIOTLGGPVMMFKLSEEASNAGVSTWFLMAM 355 

Query: 353 LSINLGIFNLIPIPALDGGKILMNIIFAIRRKPIKQETEAYITIAGVAIMVVLMIAVTWN 412 

LS+NLGI NL+PIPALDGGKI++NIIE +R KPI EE ITL G ++VLM+ VTWN 
Sbjct: 356 LSMNLGI INLLPI PALDGGKI VLNI IEGVRGKPI S PEKEGI ITLIGFGFVMVLMVLVTWN 415 

Query: 413 DIMRVFF 419 

DI R FF 
Sbjct: 416 DIQRFFF 422 

An alignment of the GAS and GBS proteins is shown below: 

Identities = 306/419 (73%) , Positives = 359/419 (85%) 

MLGILTFIIIFGVI VWHEFGHFYFAKKSGILWEFAIGMGPKIFSHIDKEGTTYTIRIL 6 0 
MLGI+TFIIIFG++V+VHEFGHFYFAKKSGILVREFAIGMGPKIFSH+D+ GT YT+R+L 
MLGIITFlIIFGILVIVHEFGHFYFAKIfflGILVREFAIGMGPKIFSHVDQGGTLYTLRML 6 0 

PLGGYVRMAGWGDDKTEIKTGTPASLTLNKEGIVTRINLSGKQLDNTSLPIMVTAYDIiED 120 
PLGGYVRMAGWGDDKTEIKTGTPASLTLN++G V RINLS +LD TSLP++VT YDLED 



+L+ITGLVL ETKTY V KDATI+SEDGTEIRIAPLD+QYQNAS+ GRLITNFAGPMNNF 



Query: 


1 


Sbjct: 


1 


Query: 


61 


Sbjct: 


61 


Query: 


121 


Sbjct: 


121 


Query: 




Sbjct: 


181 


Query: 


241 


Sbjct: 


241 




301 


Sbjct: 


301 




361 


Sbjct: 


361 



ILG+WFI L F+QGG+ D S+N VRV ENG AA AGL++ND+I+ 1 +KV++W LT 



(• + KS + +KT+ VKPQK K+Y IG+ +LKT FKDKLLGGL+ 



FS+NKLGGPVA+Y S+QAA+NG +VL+LM ++SINLGI 



•M+VLMIAVTWNDIMR FF 



Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 750 

A DNA sequence (GBSx0797) was identified in S.agalactiae <SEQ ID 2303> which encodes the amino 
acid sequence <SEQ ID 2304>. This protein is predicted to be prolyl-tRNA synthetase (proS). Analysis of 
this protein sequence reveals the following: 

Possible site: 18 

»> Seems to have no N-terrainal signal sequence 
INTEGRAL Likelihood = -0.32 



■ Final Results 

bacterial membrane Certainty=o. 1128 (Affirmative) ■ 

bacterial outside Certainty=0. 0000 (Not Clear) < : 
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bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 101 81> which encodes amino acid sequence <SEQ ID 
10182> was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

M3P:CAB13530 GB:Z99112 prolyl-tRNA synthetase [Bacillus subtilis] 
Identities = 301/608 (49%) , Positives = 410/608 (66%) , Gaps = 52/608 (8%) 

MKQSKMLIPTLREMPSDAQVISHALMVRAGYVRQVSAGIYAYLPIANRTIEKFKTIMRQE 60 
M+QS LIPTLRE+P+DA+ SH L++RAG++RQ ++G+Y+Y+PIA + 1+ + I+R+E 
MRQSLTLI PTLREVPADAEAKSHQLLLRAGF I RQNTSGVYSYMPIAYKVI QNI QQI VREE 60 



VKSYK+LPL LYQIQSK+RDEKRPR GLLR REFIMXD YSFH E LD TY+ +A 



Query: 


1 


Sbjct: 


1 


Query: 


61 


Sbjct: 


61 


Query: 


121 


Sbjct: 


121 




181 


Sbjct: 


181 




241 


Sbjct: 


215 


Query: 


301 


Sbjct: 


263 


Query: 


361 


Sbjct: 


323 




420 


Sbjct: 


383 


Query: 
Sbjct: 


480 




540 


Sbjct: 


490 


Query: 


600 


Sbjct: 


550 



Y IF R G++ + +1 D GAMGGKD+ EFMA++ 



IK++LF AD++ V+ L+ G+ +VND+K+KN L A+ +E A+ E+ 



V++ AD+ V+ + NAV+GAN+ +H+ VN RD 



D+R +KEG+ SPDGKGT++FA G1E+G +FKLGTRYS++M A LDENGR+ P++MGCYG 



EK+ ADL +GYEVL DDR ER G KF+DSDLIGLPIR+TVGK+A EGIVEVKI+ +G++ 



A related DNA sequence was identified in S.pyogenes <SEQ ID 2305> which encodes the amino acid 
sequence <SEQ ID 2306>. Analysis of this protein sequence reveals the following: 

Possible site: 18 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.32 Transmembrane 473 - 489 ( 473 - 490) 

Final Results 

bacterial membrane Certainty=0. 1128 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 
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An alignment of the GAS and GBS proteins is shown below: 

Identities = 535/617 (86%) , Positives = 584/617 (93%) 

Query: 1 MKQSKML1PTLREMPSDAQVISHM 1 ^^VRAGYVRQVSAGIYAYLPLRNRTIEKFKTIMRQE 60 
5 MKQSK+LI PTLREMPSDAQVISHALMVRAGYVRQVSAGI YAYLPLANRTIEKFKTIMR+E 

Sbjct: 1 MKQSKLLIPTLREMPSDAQVISHALMVRAGYVRQVSAGIYAYLPLANRTIEKFKTIMREE 60 



Query: 61 FEKIGAVEMLAPALLTADLWRESGRYETYGEDLYICLKNRDQSDFILGPTHEETFTTLVRD 120 

FEKIGAvBMIAPALIjTADLWRESGRYETYGEDLYKLKMRD SDFILGPTHEETFTTLVRD 
Sbjct: 61 FEKIGAV^MIAPALLTADLWRESGRYETYGEDLYKLKNRDNSDFILGPTHEETFTTLVRD 120 

Query: 121 AVKSYKQLPLNLYQIQSKYRDEKRPRNGLLRTREFIMKDGYSFHKDYEDLDVTYEDYRKA 180 

AVKSYKQLPLNLYQIQSKYRDEKRPRNGLLRTREFIMKDGYSFH +YEDLDVTYEDYR+A 
Sbjct: 121 AVKSYKQLPLNLYQIQSKYRDEKRPRNGLLRTREFIMKDGYSFHHNYEDLDVTYEDYRQA 180 

Query: 181 YEAIFTRAGLDFKGIIGDGGAMGGKDSQEFt4AVTPNRTDLNRWLVLDKTIPSIDDIPEDV 240 

YEAI FTRAGLDFKGI IGDGGAMGGKDSQEFMA+TP RTDL+RW+ VLDK4 1 S+DDIP++V 
Sbjct: 181 YEAIFTRAGLDFKGIIGDGGAMGGKDSQEFI'IMTPARTDLDRWWLDKSIASMDDIPKEV 240 

Query: 241 LEEIKVELSAWLVSGEDTIAYSTESSYAANLEMATNEYKPSTKAATFEEVTKVETPNCKS 3 00 

LE+IK EL+AW+ + SGEDTIAYSTESSYAANLEMATNEYKPS +K A + + +VETP+CK+ 
Sbjct: 241 LEDIKAEIAATOIISGEDTIAYSTESSYAANLEMATNEYKPSSKVAAEDALAEVETPHCKT 3 00 

Query: 301 IDEVAGFLSIDENQTIKTLLFIADEQPWALLVGISnDQviroVKLKNYIAADFLEPASEEQA 360 

IDEVA FLS+DE QTIKTLLF+AD +PWALLVGND +N VKLKNYLAADFLEPASEE+A 
Sbjct: 301 IDEVAAFLSVllETQTIKTI^FVADIffiPWALLVGMDHINTVKIjKOTIAADFLEPASEEEA 360 

Query: 361 KEIFGAGFGSLGPWLPDSVKIIADRKVQDLANAVSGANQDGYHFTGVNPERDFTAEYVD 420 

+ FGAGFGSLGPVNL +I+ADRKVQ+L NAV+GAN+DG+H TGVNP RDF AEYVD 
Sbjct: 361 RAFFGftGFGSLGPVMiAQGSRIVADRKVQNIiT^VAGANKDGFHMTGvMPGRDFC3AEYVD 420 

Query: 421 IREVKEGEISPDGKGTLKFARGIEIGHIFKLGTRYSDSMGANILDENGRSNPIVMGCYGI 480 

IREVKEGE+SPDG G L+FARGIE+GHIFKLGTRYSDSMGA ILDENGR+ PIVMGCYGI 
Sbjct: 421 IREVKEGKMSPDGHGVL,QFARGIEVGHIFKLGTRYSDSMGATILDENGRTVPIVMGCYGI 480 

Query: 481 GVSRILSAVIEQHARLFVNKTPKGAYRFAWGINFPEEIAPFDVHLITVNVKDQESQDLTE 540 

GVSRILSAVIEQHARLFVNKTPKG YR+AWGINFP+ELAPFDVHLITVNVKDQ +QDLT 
Sbjct: 481 GVSRILSAVIEQHARLFVNKTPKGDYRYAWGINFPKELAPFDVHLITVNVKDQVAQDLTA 540 

Query: 541 KIEADLMLKGYEVLTDDRNERVGSKFSDSDLIGLPIRVTVGKKASEGIVEVKIKASGDTI 600 

K+EADLM KGY+VLTDDRNERVGSKFSDSDLIGLPIRVTVGKKA+EGIVE+KIKA+GD+I 
Sbjct: 541 KLEADLMAKGYDVLTDDRNERVGS KFSDSDLIGLP I RVTVGKKAAEGI VEI KI KATGDS I 600 

Query: 601 EVHADNLIETLEILTKK 617 

EV+A+NLIETLEILTK+ 
Sbjct: 601 EVNAENLIETLEILTKE 617 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



50 Example 751 

A DNA sequence (GBSx0798) was identified in S.agalactiae <SEQ ID 2307> which encodes the amino 
acid sequence <SEQ ID 2308>. This protein is predicted to be peptidoglycan hydrolase (flgj). Analysis of 
this protein sequence reveals the following: 

Possible site: 21 
55 i>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -1.86 Transmembrane 9 - 25 ( 9 - 25) 



Final Results 

bacterial membrane Certainty=0 . 1744 (Affirmative) < succ; 

60 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) ' < suco 
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-850- 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB94815 GB:AJ245582 peptidoglycan hydrolase [Streptococcus thermophilus] 
Identities = 101/201 (50%), Positives = 122/201 (60%), Gaps = 9/201 (4%) 

KSRKKDKLVLRLTT TLLVFGL GGWFYNYKNDNTOPTVTSASDQTTTFIQT 52 

KS+KK K VL +L+ GIi G + N+ +E +T + T Fl 

KSKKKKKSVLLFPKFFQKWSLIFIGLFSLLGLLASliNFPRLTMEKNMTPTDETTVRFlAE 75 



Y V +S T SYKDATAALTG+YAT 



Query: 


2 


Sbjct: 


16 




53 


Sbjct: 


76 




113 


Sbjct : 


136 


Query: 


173 


Sbjct: 


196 



DT Y KLN HE Y L YD 



A related DNA sequence was identified in S.pyogenes <SEQ ID 2309> which encodes the amino acid 
sequence <SEQ ID 2310>. Analysis of this protein sequence reveals the following: 

Possible site: 24 

»> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside --- Certainty=0. 3000 (Affirmative) < suco 
bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 



KKGKLVLISLFV1AACLGAYSAMRQSHKISNVSAETIASSSTRHFIDEIGPTASTIGQER 63 
+K L+ I LF L L + + R + + + T +T FI E1G T+ + 

QKWSLIFIGLFSLLGLLASLNFPRLTMEKNM TPTDETTVAFIAEIGETSRYLAARN 87 



DLYASVMIAQAILES +G+S LSQ P YNFFGIKG YNG SVT+ TWEDDG GN Y ID 



3 Y+G +S T SY+DATAALTG+YATDT+Y KLN+II 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 108/192 (56%) , Positives = 124/192 (64%) , Gaps = 2/192 (1%) 

Query: 3 SRKKDKLVL-RLTTTLLVFGLGGWFYNYKNDNVEPTVTSASDQTTTFIQTISPTAIEIS 61 

++KK KLVL L G ++K MV T AS T FI I PTA I 

Sbjct: 2 TKKKGKLVLISLFVLAACLGAYSAMRQSHKTSNVSAE-TIASSSTRHFIDEIGPTASTIG 60 

Query: 62 KTYDLYASVLLAQAILESSSGQSDLSKAPlSrri&FGIKGEYKGKSVQMPTLEDDGKGNMTQ 121 

+ DLYASV++AQAILESS+G+S LS+AP YH FGIKG Y G SV M T EDDG GN 
Sbjct: 61 QERDLYASVMIAQAILESSNGKSSLSQAPYYNFFGIKGAYNGSSVTMSTWEDDGNGNTYT 120 



Query: 


4 


Sbjct: 


32 




64 


Sbjct: 


88 




124 


Sbj ct: 


148 




184 


Sbjct: 


208 



Query: 122 IQAPFRAYPNYSASLYDYAELVSSQKYASVWKSNTSSYKDATAALTGLYATDTAYASKUM 181 
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I FRAYP+ + SL DYA+L+SS Y KSNT SY+DATAALTGIjYATDT+Y KLN 
Sbjct: 121 IDQAFRAYPSIADSLNDYADLLSSSTYIGARKSNTLSYQDATAALTGLYATDTSYNLKLN 180 

Query: 182 QIIETYSLDAYD 193 

II TY L AYD 
Sbjct: 181 NIIATYGLTAYD 192 

A further related DNA sequence was identified in S.pyogenes <SEQ ID 9073> which encodes the amino 
acid sequence <SEQ ID 9074>. Analysis of this protein sequence reveals the following: 

Possible site: 58 

»> Seems to have a cleavable N-term signal seq. 

Final Results 

bacterial outside Certainty=0. 3000 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Mot Clear) < suco 



An alignment of the GAS and GBS sequences follows: 



96/169 (56%) , Gaps = 3/169 (1%) 

Query: 3 0 MWTLKLGNQRLAPY ADHETLTFVRKISHAAQSVAQKKQLYSSVMMAQAILESNNGKS 86 

+W N + P A +T TF++ IS A +++ LY+SV++AQAILES++G+S 

Sbjct: 25 WFYNYKNDNVEPTVTSASDQTTTFIQTISPTAIEISKTYDIiYASVLLAQAILESSSGQS 84 

Query: 87 QLSQKPYYNFFGIKGSYKERSVIFPTLEDDGCGNLYQIDAAFRSYGSLTACFLDYARVLN 146 

LS+ P YN FGIKG YK +SV PTLEDDG+GN+ QI A FR+Y + +A DYA +++ 
Sbjct: 85 DLSKAPNYNLFGIKGEYKGKSVQMPTLEDDGKGNMTQIQAPFRAYPNYSASLYDYAELVS 144 

Query: 147 DPLYDKTHiaCFWSIIYQXXXXXXffiQQOC^ 195 

Y K S Y+ KLN++IE Y L +D 

Sbjct: 145 SQKYASVWKSNTSSYKDATAALTGLYATDTAYASKLNQIIETYSLDAYD 193 

A further related DNA sequence was identified in S.pyogenes <SEQ ID 9075> which encodes the amino 
acid sequence <SEQ ID 9076>. An alignment of the GAS and GBS sequences follows: 

79/151 (51%) , Gaps = 10/151 (6%) 

Query: 2 TFLDKIKQGCLDGWAKYKILPSLTAAQAILESGWGKE APHNALFGIKADSSWTGKS 57 

TF+ I ++ Y + S+ AQAILES G+ AP+ LFGIK + + GKS 

Sbjct: 48 TFIQTISPTAIEISKTYDLYASVLLAQAILESSSGQSDLSKAPNYNLFGIKGE--YKGKS 105 

Query: 58 FDTKTQEEYQAGWTDIVDRFRAYDS5«JDESIADHGQF1 J VDNPRYEAV--IGETDYKKACY 115 

T E+ G +T I FRAY ++ S+ D+ + LV + +Y +V + YK A 

Sbjct: 106 VQMPTLEDDGKGNMTQIQAPFRAYPNYSASLYDYAE-LVSSQKYASVWKSNTSSYKDATA 164 

Query: 116 AIKAAGYATASSYVELLIQLIEENDLQSWDR 146 

A+ YAT ++Y L Q+IE L ++D+ 
Sbjct: 165 ALTGL - YATDTAYAS KLNQ 1 1 ET YSLDAYD K 194 

SEQ ID 2308 (GBS275) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 52 (lane 4; MW 22.6kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 58 (lane 4; MW 47.5kDa). 

The GBS275-GST fusion product was purified (Figure 208, lane 5) and used to immunise mice. The 
resulting antiserum was used for FACS (Figure 276), which confirmed that the protein is immunoaccessible 
on GBS bacteria. 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 752 

A DNA sequence (GBSx0799) was identified in S.agalactiae <SEQ ID 231 1> which encodes the amino 
acid sequence <SEQ ID 2312>. Analysis of this protein sequence reveals the following: 

Possible site: 27 

>» Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.16 Transmembrane 876 - 892 ( 876 - 892) 

Final Results 

bacterial membrane Certainty=0 . 10S5 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 
bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2313> which encodes the amino acid 
sequence <SEQ ID 23 14>. Analysis of this protein sequence reveals the following: 

Possible site: 48 



- Final Results 

bacterial membrane — Certair.ty=0 . 1065 (Affirmative) < succ; 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

>GP:CAB94815 GB:AJ245582 peptidoglycan hydrolase tStrepti 
Identities = 96/202 (47%) , Positives = 127/202 (62%) , Gaps 



Query: 


4 


Sbjct: 


15 


Query: 


54 


Sbjct: 


75 




114 


Sbjct: 


13 5 


Query: 


174 


Sbjct: 


195 



EDDG+GN Y IDAAFRSYGS+ DY L Y H+ Y+DATA LTG YA 



TDTTY KLN +IE YQLT +D 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 1244/1468 (84%), Positives = 1351/1468 (91%), Gaps = 3/1468 (0%) 

Query: 1 MSELFKKLMDQIEMPLEIKNSSVFSSADIIEVKVHSLSRLWEFHFSFPELLPIEVYRELQ 60 

MS+LF KLMDQIEMPL+++ SS FSSADIIEVKVHS+SRLWEFHF+F +LPI YREL 
Sbjct: 1 MSDLFAKLMDQIEMPLDMRRSSAFSSADIIEVK\'HSVSRLWEFHFAFAAVLPIATYRELH 60 

Query: 61 TRLWSFEKADIK^TFDIRAETIDFSDDLI^YYQQAFCEPLCNSASFKSSFSQLKVHYII 120 

RL+ +FE ADIK TFDI+A +D+SDDLLQ YYQ+AF CNSASFKSSFS+LKV Y 
Sbjct: 61 DRLIRTFFJ^IKVTFDIQAAQVDYSDDLLQAYli'QFAFSHAPCNSASFKSSFSKLKVTYE 120 

Query: 121 GSQMIISAPQFvNNNHFRQNHLPRLEQQFSLFGFGKLAID^lVSDEQMTQDLKSSFETNRE 180 

++II+AP FVNN+HFR NHLP L +Q FGFG L IDMVSD++MT+ L +F ++R+ 
Sbjct: 121 DDKLIIAAPGFVl^mHFRMIIHLPNLVKQLEAFGFGILTIDMVSDQEMTEHLTKNFVSSRQ 180 
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Query: 


181 


Sbjct: 


181 


Query: 


241 


Sbjct: 


238 


Query: 


301 


Sbjct: 


298 


Query: 


361 


Sbjct: 


358 


Query: 


421 


Sbjct: 


418 


Query: 


481 


Sbjct: 


478 


Query: 


541 


Sbjct: 


538 


Query: 


601 


Sbjct: 


598 


Query: 


661 


Sbjct: 


658 


Query: 


721 


Sbjct: 


718 


Query: 


781 


Sbjct: 


778 


Query: 


841 


Sbjct: 


838 




901 


Sbjct: 


898 


Query: 


961 


Sbjct: 


958 



TEENRIVFEGMVF VERKTTRTGRHIINFKMTDYTSS?A+QKWAKDDEEL+K+DMI+KG 



+WLRV+GNIE N FTKSLTMNVQ +KEIV HERKDLMP QKRVE HAHTNMSTMDALPT 



LTGITD H++G+KP++ VL+ FQ-I-FC+ ++LVAHNA+FDVGFMNANYERH+LP ITQPVI 



DTLEFARNLYPEYKRHGLGPLTKRFQV+L+HHHMANYDAEATGRLLFIFLK+ARE 



NL++LNT IiVAEDSYKKARIKHATIYVQNQVGLKN+FKLVSLSN+KYFEGV RIPR+VLD 



AHREGLLLGTACSDGEVFDA+L+ GIDAAV IA+YYDFIE+MPPAIY+PLWR+LIKD+ 



GI+Q+IRDLIEVG+R KPVIATGNVHY4EPE+EIYREIIVRSLGQGAMINRTIGRGE A 



QPAPLPKAHFRTTNEMLDEFAFLGKDLAY++W NT FADR E+VEWKGDLYTP++D+ 



SILGNGFAVIYLASQMLVQRSNERGY 900 
AEE VAELTY KAFEIYGNPLPDIIDLRIEKEL S I LGNGFAVI YLASQMLV RSNERGY 
AEETVAELTYQKAFEIYGNPLPDIIDLRIEKELTSILGN3FAVIYLASQMLVNRSNERGY 897 



T Y+KDGQDI PFETFLGFDGDKVPDIDLNFSGDDQPSAHLDVRDI FG+EYAFRAGTVGTV 



Query: 1021 AEKTAFGFVKGYERDYNKFYNDAEVERIATGAAGVKR3TGQHPGGIWIPNYMDVYDFTP 1080 

AEKTA+GFVKGYERDY KEY DAEV+RLA GAAGVKR+TGQHPGGI WI PNYMDVYDFTP 
Sbjct: 1018 AEKTAYGFVKGYERDYGKFYRDAEVDPJAAGAAGVKRTTGQHPGGIWI PNYMDVYDFTP 1077 

Query: 1081 VQYPADDMTAAWQTTHFNFHDIDENVLKLDILGHDDPTMIRKLQDLSGIDPSNILPDDPD 1140 

VQYPADD+TA+WQTTHFNFHDIDHSIVLKLDILGHDDPTMIRKLQDLSGIDP I DDP 
Sbjct: 1078 VQYPADDVTASWQTTHFNFHDIDENVLKLDILGHDDPTMIRKLQDLSGIDPITIPADDPG 1137 

Query: 1141 VMKLFSGTEVLGVTEEQIGTPTGMLGIPEFGTNFVRGMVNETHPTTFAELLQLSGLSHGT 1200 

VM LFSGTEVLGVT EQIGTPTGMI/3IPEFGTNFWGMVNETHPTTFAELLQLSGLSHGT 
Sbjct: 1138 VMALFSGTEVLGVTPEQIGTPTGMLGIPEFGTNFVRGMVNETHPTTFAELLQLSGLSHGT 1197 
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Query: 1201 DVWLGNAQDLIKEGIATIjSTVIGCRDDIMI/YLMHAGLQPKMAFTIMERVRKGLWLKISED 1260 

DWLGNAQDLIKEGIATL TVIGCRDDIMVYLMHAGL+PKMAFTIMERWKGLWLKISE+ 
Sbjct: 1198 DWLGNAQDLIKEGIATLKWIGCRDDIMVYLMHAGLEPKMAFTIMERWKGLWLKISEE 1257 

Query: 1261 ERNGYIQAMRDIOTPDWYIESCKKIKYMFPKaHAAAWIJ^RVAYFKOT^PIFYYCAyF 1320 

ERNGYI AMR+MWPDWYIESCGKIKYMFPKAHAAAYVLMALRVAYFKVH+PI YYCAYF 
Sbjct: 1258 ERNGYIDAMREIWPDWYIESCGKIKYMFPKAHAAAY^MALRVAYFKVHHPIMYYCAYF 1317 

Query: 1321 SIRAKAFELRTMSAGLDAVKARMKDITEKRQR^EATNVElIDLFTTLELVNEMLERGFiCFG 1380 

SIRAKAFEL+TMS GLDAVKARM+DIT KR+ NEATKVENDLFTrLE+VNEMLERGFKFG 
Sbjct: 1318 SIRAKAFELKTMSGGLDAVKARMEDITIKRKNNEATNVEHDLFTTLEIVNEMLERGFKFG 1377 

Query: 1381 KLDLYRSHATDFIIEEDTLIPPFVAMEGLGENVAKQIVRAREDGEFLSKTELRKRGGVSS 1440 

KLDLY+S A +F 1+ DTLIPPF+A+EGLGENVAKQIV+AR++GEFLSK ELRKRGG SS 
Sbjct: 1378 KLDLYKSDAIEFQIKGDTLIPPFIALEGLGENVAKQIVKARQEGEFLSKMELRKRGGASS 1437 

Query: 1441 TLVEKFDEMGILGNLPEDNQLSLFDDFF 1468 

TLVEK DEMGI LGM+ PEDNQLSLFDDFF 
Sbjct: 1438 TLVEKMDEMGILGNMPEDNQLSLFDDFF 1465 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 753 

A DNA sequence (GBSx0800) was identified in S.agalactiae <SEQ ID 2315> which encodes the amino 
acid sequence <SEQ ID 2316>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0 . 1505 (Affirmative) < suco 

bacterial membrane Certainty=0 . 00 0 0 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 101 79> which encodes amino acid sequence <SEQ ID 
101 80> was also identified. 

The protein has homology with the following sequences in the GENPEPT database: 

>GP:CAB13207 GB:Z99111 similar to transcriptional regulator (MarR 
family) [Bacillus subtilis] 
Identities = 49/124 (39%) , Positives = 73/124 (58%) 

Query: 18 VMRKAFRTIDGKVSESFKEFELTPTQFAVLDVLYAKGTMKIGErjIEWLATSGNMTVVIK 77 

V +AF+++ KE PT+FAVL++LY +G K+ ++ +L SGN+T VI 

Sbjct: 20 VFARAFKSVSEHSIRDSKEHGFNPTEFAVLELLYTRGPQKLQQIGSRLLLVSGNVTYVID 79 

Query: 78 NMEKKGWVLRHSCPNDKPAFLVSLTTEGEEVIKKALPEHIKRVEDAFSVLTETEQEDLIK 137 

+E+ G+++R P DKR+ LT +G E + K P H R+ AFS L+ EQ+ LI 
Sbjct: 80 KLERNGFLVREQDPKDKRSVYAHLTDKGNEYLDKIYPIHALRIARAFSGLSPDEQDQLIV 139 

Query: 138 LLKK 141 
LLKK 

Sbjct: 140 LLKK 143 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2317> which encodes the amino acid 
sequence <SEQ ID 2318>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal segwence 
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bacterial cytoplasm Certainty=0 . 0537 (Affirmative) < succ: 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 



An alignment of the GAS and GBS proteins is shown below: 

Identities = 80/145 (55%) , Positives = 111/145 (76%) , Gaps = 1/145 (0%) 

Query: 2 GDEMGNF-KNSAVKSMWMRKAFRTIDSKVSESFICEFELTPTQFAVLDVLYAKGTMKIGE 60 

G++M + KN+A+K+MW RKA RT+D ++ FK+ +LT TQF+VL+VLY KG M+I 
Sbjct: 8 GNQMSHLDKNTALKAMWFRKAQRTLDAFGADIFKKADLTATQFSVLEVLYTKGCMRINH 67 

Query: 61 LIENMLATSGNMTWIKNMEKKGWVLRHSCPNDKRAFLVSLTTEGEEVIKKALPEHIKRV 120 

LI+++LATSGNMTW+ NME+ GW+ + DKRA++V+LT +G +1+ LP+H+ RV 

Sbjct: 68 LIDSLIATSGNMTWLJMERNGWISKCKDKTDKRA^ 127 

Query: 121 EDAFSVLTETEQEDLINLLKKFKTL 145 

E+AF+VLTE EQ LI LLKKFK L 
Sbjct: 128 EEAFAVLTEKEQLCLIELLKKFKQL 152 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 



Example 754 

A DNA sequence (GBSx0801) was identified in S.agalactiae <SEQ ID 2319> which encodes the amino 
acid sequence <SEQ ID 2320>. Analysis of this protein sequence reveals the following: 
Possible site: 46 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm — Certainty=0 . 3742 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

35 >GP:AAG05963 GB:AE004686 hypothetical protein [Pseudomonas aeruginosa] 

Identities = 115/203 (56%) , Positives = 143/203 (69%) , Gaps = 7/203 (3%) 





Query: 


2 


SFLEELKNRRS I YALGRNTEVSDEKIVEI I KEAVRQSPSAFNSQTSRWILLNDEVTKFW 5 1 








+FL +KNRR+IYAL + VS EKIVE++KEAV SPSAFNSQ+SRW+L E +FW 


40 


Sbjct: 




AFLSSIKNRRTIYALDKQLPVSQEKIVELVKEAVSHSPSAFNSQSSRVVVLFGAEHEQFW 63 




Query: 


62 


DELVANDLVETMPCVQGAPETAIAGTJCEKriASFGASKGTVLFFEDQDWKSLQEQFVLYAD 121 








+ +A D E K+ P A A T+ KL SF A GTVLFFEDQ W+ LQEQF LYAD 




Sbjct: 


64 


N- - IAKD - - ELKKI - - VPADAFAATETKLNS FAAGAG1VLFFEDQTVVRQLQEQFALYAD 117 


45 










122 


NFPVWSEQSTGIASVWTWTALSAEIXlLGGNLQHYNPVIDASVQAvYGVPASWKLRGQ 181 








NFP VWSEQ+ +G+A VITAL AE +G +LQHYNP++DA + +P SWKLR Q+ F 




Sb j ct : 


118 


NFPWSEQASGMAQFAVWTAL-AEHKVGASLQHYNPLVDAQTHKTWNLPESWKLRAQMPF 176 


50 






GSIEAETGEKEFMNDDDRFKVIG 204 








G+I A GEK F+ + +RFKV G 




Sbj ct : 


177 


GAIAAPAGEKAFIAESERFKVFG 199 



No corresponding DNA sequence was identified in S.pyogenes. 
55 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 755 

A DNA sequence (GBSx0802) was identified in S.agalactiae <SEQ ID 2321> which encodes the amino 
acid sequence <SEQ ID 2322>. Analysis of this protein sequence reveals the following: 

Possible site: 58 
5 >>> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2 73 0 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB62846 GB.-AL03S47S hypothetical protein [Plasmodium falciparum] 
(ver 2) 

Identities = 112/529 (21%) , Positives = 217/529 (40%) , Gaps = 67/529 (12%) 

Query: 3 NKKHKLLKN1EEFKTITQKRLTERGKFPYDTVKSTFEIKDENPIMERLKSSGLSMGKP- - 60 

N K+ +K + ++ Q + E+ KF D H E + E FI E + + K 
Sbjct: 1063 NVKYNEMKGAKN-DSLNQNEIIEKEKF--DLQH ENRSERFIEEEKQICIVDDKKNNI 1116 

Query: 61 --VDYMGVNGIPIYTKTLSIVNKFAFENNSKDSSYSSNINISEDKIKENDQKILDLIVKS 118 

VD + PY + L+ +N + YS+ DKI +N++ ++ K 

Sbjct: 1117 MNVDEKRKSDHPSYERVLKMEG SNKNEEGYSHT DKILKNEKNEKNVNEKK 1166 

Query: 119 GANNQNLTDEEKVIAFTKYIGEITTWDNEAYFJUUJVnTEYYRASDLFSVTERPCIiAMCVGY 178 

G N++ +E+K K + E + ++E D + F +C 

Sbjct: 1167 GENDEKNEiraKKEEroSKNVNEKKDENDEKNEM 1226 

Query: 179 SVTAARAraiMGIPSYWSGKSPQGISHAAVRAYYNRSWHIIDITASTYWKNGNYKTTYS 238 

+ N + IPS ++ +GI + N S 1+ KN N ++ YS 

Sbjct: 1227 LIFINNKKNSILIPS ENEKGIIGSQKEEEQNISPVKINNKKKDLCKNIN-ESDYS 1280 

Query: 239 DFIKEYCIDGYD- -VYDPAKT^FK-VKYMESNEAFENWIHNNGSKSML FIN 288 

D ++ + +Y +N++ 4- ++ +NE + + +N S++ L ++ 

Sbjct: 1281 DKQYSVLI^SIEKKIYKKCSSNSKIRGIEKKKINEDYvI)LKNINCSRNTLEFFIjTKKYLK 1340 

Query: 289 ESAALKDKKPKDDFVPVTEKEKNELIDKYKKLLSQIPENTQNPGEKNIRDYLKNEYEEIL 348 

S + ++ + V EK+K 4 KKKL+IN P+I + + +EY + 
Sbjct: 1341 SSELIINEHDCQNINNVYEKKKKKEQAK-KKIiNRKI - -NVNIPNDSIIEENMSSEYNFVK 1397 

Query: 349 KKDN LFEHEHAE FKESIjNLNESFYLQLKKEE - -MKPSDNLKKEE 390 

KK+N FE + ++ F N + L +E+ ++ +N K+ E 

Sbjct: 1398 KKNrOTCMVKFETKRSKSILSSEIFAVXKNKKRATWLMRSEEQFISSIGLVEKGENKKRIE 1457 

Query: 391 KPRENSVKERETPAENNDFVSVTEKNNLIDKYKELLSKIPENTQNPGEKNIRN--YLEKE 448 

+ E +KE+ + N+F KNNL ++ L K EN G N ++++ 

Sbjct: 1458 EKDEEYIKEK- IKNKKNEF KNNLTEQL--LFFKSAENINTSGSFNTEKIRHVKRT 1509 

Query: 449 YEELLQKDKLFKHEYTEFTKSLNLNETFYSQLKEGEMKLSENPEKGETN 497 

++ + + ++ K L E ++ E + ++++N EKGE N 
Sbjct: 1510 KRKViaSOTFIIJOTFSNILKKLQRMEEDKll<>1DEQKKEINKNNEKGEFN 1558 

There is also homology to SEQ ID 598. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 756 

A DNA sequence (GBSx0803) was identified in S.agalactiae <SEQ ID 2323> which encodes the amino 
acid sequence <SEQ ID 2324>. Analysis of this protein sequence reveals the following: 
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Possible site: 22 

■>» Seems to have no N-terminal signal sequence 

Final Results 

5 bacterial cytoplasm Certainty=0 . 1243 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

10 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 757 

A DNA sequence (GBSx0804) was identified in S.agalactiae <SEQ ID 2325> which encodes the amino 
15 acid sequence <SEQ ID 2326>. This protein is predicted to be 2-dehydro-3-deoxyphosphogluconate 
aldolase/4-hydroxy-2-oxoglutarate al. Analysis of this protein sequence reveals the following: 

Possible site: 49 

>>> Seems to have no N-terminal signal sequence 

20 'Final Results 

bacterial cytoplasm Certainty=0 . 1057 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

25 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD35160 GB:AE001693 2-dehydro-3-deoxyphosphogluconate 

aldolase/4 -hydroxy-2-oxoglutarate aldolase [Thermotoga maritima] 
Identities = 78/192 (40%), Positives = 118/192 (60%), Gaps = 6/192 (3%) 



Query: 


14 


KIVAVIRGNSQEEAFQAAQACIKGGISAIEIAYTNSKASQVIEQLVTQYTNQEQVWGAG 73 






KIVAV+R NS EEA 4- A A +GG+ IEI +T A VI++L + ++ ++GAG 


Sbjct: 


11 


KIVAVLRANSVEEAI<EKALAVPEGGVHLIEITFTVPDADTVIKEL- - S FLKEKGAI IGAG 6 8 


Query: 


74 


TVLDSETARmiLAGAKFIVSPAFNLQTAKLCNRYAIPYLPGCMTLSEVTTALEAGCElI 133 






TV E R A+ -t-GA+FIVSP + + ++ C + Y+PG MT +E+ A++ G 1+ 


Sbjct: 


69 


TvTSVEQCRKAVESGAEFIVSPHLDEEISQFCICEKGVFYMPGvMTPTELVKAMKLGHTIL 128 


Query: 


134 


KIFPGGTLGTSFISSLKAPLPQVQIIWTGGVNLTNAKDWFLSGOTAIGIGGEFNKLAALG 193 






K+FPG +G F+ ++K P P V+ + TGGVNL N +WF +GV A+G+G K G 


Sbjct: 


129 


KLFPGEVVGPQFVKAMKGPFPNVKFVPTGGVNLDNVCEWFKAGVLAVGVGSALVK G 184 




194 


EFDKITEMAKQY 205 






D++ E AK + 


Sbjct: 


185 


TPDEVREKAKAF 196 



45 

There is also homology to SEQ ID 1252. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 758 

50 A DNA sequence (GBSx0805) was identified in S.agalactiae <SEQ ID 2327> which encodes the amino 
acid sequence <SEQ ID 2328>. This protein is predicted to be 2-keto-3-deoxygluconate kinase. Analysis of 
this protein sequence reveals the following: 
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d N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4213 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD35161 GB:AE001693 2-keto-3 -deoxygluconate kinase [Thermotoga maritima] 
Identities = 94/329 (28%) , Positives = 169/329 (50%) , Gaps = 7/329 (2%) 

Query: 3 KILFFGEPLlRITPKEfflDYFADSISTKLFYGGSEVNTARALQGFGQDTKLLSALPNNPIG 62 

K++ FGE ++R++P ++ + S + YGG+E N A I, G D ++ LPNNP+G 

Sbjct: 2 KVWFGEIMLRLSPPDHKRIFQTDSFDOTYGGAEAWAAFLAQMGLDAyFVTKLPNNPLG 51 

Query: 63 NSFLQFLKAQGIDTHSIQWVGERVGLYFLEDSFACRKGSWYDRDHSSLHDFRINQIDFD 122 

++ L+ G+ T I G R4-G+YFLE + R +WYDR HS++ + + D++ 
Sbjct: 62 DAAAGHLRKFGVKTDYIARGGNRIGIYFLEIGASQRPSKVVYDRAHSAISEAKREDFDWE 121 

Query: 123 QLFEGVSLFHFSGITLSLDESIQEITLLLLKEAKKREITISLDLNFRSKLISPKNAKILF 182 

++ +G FHFSGIT L + + I LK A ++ +T+S DLN+R++L + + A+ + 
Sbjct: 122 KILDGARWFHFSGITPPLGICELPLILEDALKVANEKGVTVSCDLNYRARLWTKEEAQKVM 181 

Query: 183 SQFATFAD I CFG IEPLMVDSQDTTFFNRDEATIEDVKERMISLINHFDFQVIFHTK 238 

F + D+ IE ++ S + + E + + ++F+ + T 

Sbjct: 182 I PFMEYVDVLIANEED I EKVLGI SVEGLDLKTGKLNREAYAKIAEEVTRKYNFKTVGI TL 241 

Query: 239 RLQDEWGRNHYQAYI-ANRKQEFVTSKEITTAWQRIGSGDAFVAGALYQLLQHSDSKTV 297 

R N++- + N + F EI + R+G+GD+F +Y L DS+ 

Sbjct: 242 RESISATVNYWSVMVFENGQPHFSNRYEI - -HIVDRVGAGDSFAGALIYGSLMGFDSQKK 299 

Query: 298 IDFAVASASLKCALEGDNMFETVTAVNKV 326 

+FA A++ LK + GD + ++■ + K+ 
Sbjct: 300 AEFAAAASCLKHTIPGDFWLSIEEIEKL 328 

There is also homology to SEQ ID 1264. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 759 

A DNA sequence (GBSx0806) was identified in S.agalactiae <SEQ ID 2329> which encodes the amino 
acid sequence <SEQ ID 2330>. Analysis of this protein sequence reveals the following: 

Possible site: 16 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.22 Transmembrane 53 - 69 ( 53 - 70) 

Final Results 

bacterial membrane Certainty=0. 1086 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD36157 GB:AE0017S8 sugar-phosphate isomerase [Thermotoga maritima] 
Identities = 41/125 (32%), Positives = 61/125 (48%), Gaps = 10/125 (8%) 

Query: 1 MKIALINENSQASKNTIIYKELKA.VSDEKGFEVFNYGMYGKEEESQLTYVQNGLLTAILL 60 

MKIA+ ++++ + +++K KG EV ++G Y +E Y + ++ +IL 

Sbjct: 1 MKIAIASDHAAFE LKEKVKNYLLGKGIEVEDHGTYSEESVDYPDYAKK-WQSILS 55 
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Query: 61 HSGAADFVITGCGTGIGSyyiIiA.CNSFPGWCGFAADPVDAYLFSQVNGGNALSLPFAKGFG 120 

H ADF I CGTG+G +AN + G+ PAL N N L LP G 
Sbjct: 56 NE- -ADFGILLCGTGLGMSIAANRYRGIRAALCLFPDMARLARSHMXIANILVLP GRL 110 

5 Query: 121 WGAEL 125 

GAEL 

Sbjct: 111 I GAEL 115 

A related DNA sequence was identified in S.pyogenes <SEQ ID 233 1> which encodes the amino acid 
10 sequence <SEQ ID 2332>. Analysis of this protein sequence reveals the following: 

Possible site: 13 

>» Seems to have no N-tertninal signal sequence 

Final Results 

15 bacterial cytoplasm Certainty=0. 2599 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

20 Identities = 159/212 (75%) , Positives = 186/212 (87%) 

Query: 1 MKIALINENSQASKNTIIYKELKAVSDEKGFEVFNyGMYGKEEESQLTYVQNGLLTAILL 60 

MKIALINENSQA+KN IIY L V+D+ G++VFNYGMYG E ESQLTYVQNGLL +ILL 
Sbjct: 1 MKIALII^NSQAAKNGIIYDALTTVTDKHGYQVFNYGMYGTEGESQLTYVQNGLLASILL 60 

25 

Query: 61 NSGAADFVITGCGTGIGAMLACNSFPGWCGFAADPvXIAYLFSQvNGGNALSLPFAKGFG 120 

+ AADFV+TGCGTG+GAMLA NSFPGV CGFA++P +AYLFSQ+NGGNALS + PFAKGFG 
Sbjct: 61 TTKAADFWTGCGTGVGAMLAIiNSFPGVTCGFASEPTEAYLFSQINGGNALSIPFAKGFG 120 

^ ' WGAELNL +FERLF + ' GGGYPKERA+PEQRNARILS++K+ITYRDLL+++K+IDQDF 
Sbjct: 121 WGAELNLTLIFERLFAEPMG3GYPKERAIPEQRNAR1LSDLKKITYRDLLAIVKDIDQDF 180 

Query: 181 LKETISGEHFQEYFFANCQNQNIADYLKSVLD 212 
35 LKETISG HFQEYFFAN + + YLKSVL+ 

Sbjct: 181 LKETISGAHFQEYFFANAEPSELVTYLJCSVLE 212 

Based on this analysis, it was predicted tiiat these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

40 Example 760 

A DNA sequence (GBSx0807) was identified in S.agalactiae <SEQ ID 2333> which encodes the amino 
acid sequence <SEQ ID 2334>. Analysis of this protein sequence reveals the following: 

Possible site: 23 

>» Seems to have no N-terminal signal sequence 
45 INTEGRAL Likelihood = -0.37 Transmembrane 10 - 26 ( 8 - 26) 

Final Results 

bacterial membrane Certainty=0. 1150 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

50 bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
55 vaccines or diagnostics. 
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Example 761 

A DNA sequence (GBSxO808) was identified in S.agalactiae <SEQ ID 2335> which encodes the amino 

acid sequence <SEQ ID 2336>. This protein is predicted to be gluconate 5-dehydrogenase (fabG). Analysis 

of this protein sequence reveals the following: 

5 Possible site: 35 

i» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 1117 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC77223 GB.-AE000497 5-keto-D-gluconate 5-reductase [Escherichia 
15 coli K12] 

Identities = 116/260 (44%) , Positives = 165/260 (62%) , Gaps = 6/260 (2%) 

Query: 6 LKDNFSLEGKVALITC3ASYGIGFSIATAFARAGATIVFNDIKQELVDKGISAYKKLGIKA 65 
+ D FSL GK LITG++ GIGF +AT + GA 1+ NDI E + + + GI+A 
20 Sbjct: 1 MNDLFSLAGKNILITGSAQGIGFLLATGLGKYGAQIIINDITAERAELAVEKLHQEGIQA 60 



Query: 66 HGYVCDVTDEDGINEMVDKISQDVGVIDILVNNAGIIKRTPMLEMSAADFRQVIDIDLNA 125 

+VT + 1+ V+ I +D+G ID+LVNNAGI +R P E ++ VT ++ A 
Sbjct: 61 VAAPFNOTHKHEIDAAVEHIEKDIGPIDvLVNNAGIQRRHPFTEFPEQEWNDVIAVNQTA 120 

Query: 126 PFIVSKAVLPGMIQKGHGKIINICSMMSELGRETVAAYAAAKGGLKMLTKNIASEYGSAN 185 

F+VS+AV M+++ GK+INICSM SELGR+T+ YAA+KG +KMLT+ + E N 
Sbjct: 121 VFLVSQAVTRH^RKAGKVINICSMQSEI^RDTITPYAASKGAVKMLTRGMCVELARHN 180 

Query: 186 IQCNGIGPGYIATPQTAPLRERQDDGSRHPFDQFIIAKTPAARWGEAEDLGAPAIFIASD 245 

IQ NGI PGY T T L E + F ++ +TPAARWG+ ++L A+FL+S 

Sbjct: 181 IQVNGIAPGYFKTEMTKALVEDE AFTAWLCKRTPAARWGDPQELIGAAVFLSSK 234 



Query: 246 ASNFINGHILYVDGGILAYI 265 
35 AS+F+NGH+L+VDGG+L + 

Sbjct: 235 ASDFVNGHLLFVDGGMLVAV 254 



There is also homology to SEQ ID 1242: 

Identities = 225/264 (85%) , Positives = 246/264 (92%) 

Query: 6 LKDNFSLEGKVALITGASYGIGFSIATAFARAGATIVFNDIKQELVDKGISAYKKLGIKA 65 

+++ FSL+GK+ALITGASYGIGF IA A+A+AGATIVFNDIKQELVDKG++AY++LGI+A 
Sbjct: 1 MENMFSLQGKIALITGASYGIGFEIAKAYAQAGATIVFNDIKQELVDKGLAAYRELGIEA 60 

Query: 66 HGYVCDVTDEDGINE^WDKISQDVGVIDILVNK"AGIIKRTPMLEMSAADFRQVIDIDLNA 125 

HGYVCDVTDE GI +MV +1 +VG IDILVNNAGI I+RTPMLEM+A DFRQVIDIDMJA 
Sbjct: 61 HGYVCDVTDEAGIQQ^4VSQIEDEyGAIDILVNKAGIIRRTPMLEMAAEDFRQVIDIDLNA 120 



, Query: 126 PFIVSKAVLPGMIQKGHGKIINICSMMSEH3RETVAAYAAAKGGLKMLTKNIASEYGSAN 185 
PFIVSKAVLP MI KGHGKIINICSMMSELGRETV+AYAAAKGGLKMLTKNIASE+G AN 
Sbjct: 121 PFIVSKAVLPSMIAKGHGKI INI CSMMSELGRET\''SAYAA21KGGLKMLTKNIASEFGEAN 180 

Query: 186 IQCNG1GPGYIATPQTAPLRERQDDGSRHPFDQFIIAKTPAARWGEAEDLGAPAIFLASD 245 

IQCNGIGPGYIATPQTAPLRERQ DGSRHPFDQFIIAKTPAARWG EDL PA+FLASD 
Sbjct: 181 IQCNGlGPGYIATPQTAPLRERQADGSRHPFDQFIIAKTPAARWGTTEDIiAGPAVFLASD 240 

Query: 246 ASNFINGHILYVDGGILAYIGKQP 269 

ASNF+NGHILYVDGGILAYIGKQP 
Sbjct: 241 ASNFVNGHILYVDGGILAYIGKQP 264 
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Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 762 

A DNA sequence (GBSx0809) was identified in S.agalactiae <SEQ ID 2337> which encodes the amino 
5 acid sequence <SEQ ID 2338>. This protein is predicted to be mannose-specific phosphotransferase system 
component IIAB, Analysis of this protein sequence reveals the following: 

Pocsible site: 24 

»> Seems to have no N-terminal signal sequence 

10 Final Results 

bacterial cytoplasm Certainty=0 . 0886 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

15 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AM>46485 GB:AF130465 mannose-specific phosphotransferase system 
component IIAB [Streptococcus salivarius] 
Identities = 43/107 (40%) , Positives = 61/107 (55%) , Gaps = 3/107 (2%) 

20 Query: 2 IKIIIVAHGNFPDGILSSLELIAGHQEYWGINFIAGMSSNDVRVALQREVIDFK- - -EI 58 

I III +HG F +GI S +1 G QE V + F+ +D+ + F EI 

Sbjct: 3 IGIIIASHGKFAEGIHQSGSMIFGDQEKVQWTFMPSEGPDDLYAHFNDAIAQFDADDEI 62 

Query: 59 LvlTDLLGGTPFNVSSALSVEYTDKKIKVLSGLmSMLMEAVLSRTM 105 
25 LVL DL G+PFN +S ++ E D+KI +++GUJL ML++A R M 

Sbjct: 63 LVLADLWSGSPFNQASRIAGENPDRKIAIITGUJLPMLIQAYTERMM 109 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2339> which encodes the amino acid 
sequence <SEQ ID 2340>. Analysis of this protein sequence reveals the following: 

30 Possible site: 41 

>>> Seems to have an uncleavable N-term signal seq 

Final Results 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

35 bacterial outside Certainty=0.0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the databases: 

>GP:AAF81086 GB:AF228498 AgaF [Escherichia coli] 
40 Identities = 48/127 (37%), Positives = 71/127 (55%), Gaps - 6/127 (4%) 



Query: 1 MIAIIVMGHGHFASGIVSALELIAGKQEKATAIDFTTEKTAADVQDQLSRALIP EEE 57 

M++I1+ GHG FASG+ A++ I G+Q + AID + A + QL A+ E+ 

Sbj ct : 1 MLSIILTGHGGFASGMEKAMKQILGEQSQFIAIDVPETSSTALLTSQLEEAIAQLDCEDG 60 

45 

Query: 58 TLVLCDLLGGTPFKVAATLMESLPKTTCTn/LSGLNLAMLIEASFARQTAASFDDLVSGLI 117 

+ L DLLGGTPF+VA+TL P C V++G NL +L+E R+ + + V L 
Sbjct: 61 IVFLTDLLGGTPFRVASTIiAMQKPG--CEVITGTNLCLLLEMVLEREGIjSGEEFRVQAL- 117 

50 Query: 118 TCSKEGI 124 

C G+ 
Sbjct: 118 ECGHRGL 124 

An alignment of the GAS and GBS proteins is shown below. 

55 Identities = 73/145 (50%) , Positives = 94/146 (64%) , Gaps = 3/146 (2%) 



Query: 1 MIKIIIVaHGNFPDGILSSLELIAGHQEYWGINFIAGMSSNDVRVALQREVIDFKEILiV 60 
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MI II++ HG+P GI+S+LELIAG QE V I+F M++ DV+ L R +1 +E LV 
Sbjct: 1 MIAIIVMGHGHFASGIVSALELIAGKQEKVTAIDFTTEMTAADVQDQLSRALIPEEETLV SO 

Query: 61 LTDLLGGTPFWSSALSVEyTDKKIKVLSGnNLSMLMEAVLSRTMFEHVDDLVDKVITSS 120 
5 L DLLGGTPF V++ L + VLSGLNL+ML+EA +R DDLV + IT S 

Sbjct: 61 LCDLLGGTPFKVAATLMESLPNTTCNVLSGLNIAMLIEASFARQTAASFDDLVSGLITCS 120 

Query: 121 HEGIVDFSTCLATQTAEATFE — GGI 144 
EGIVD+ T L+ Q AT + GGI 
10 Sbjct: 121 KEGIVDWKT-LSQQEDGATDDELGGI 145 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 763 

15 A DNA sequence (GBSx0811) was identified in S.agalactiae <SEQ ID 2341> which encodes the amino 
acid sequence <SEQ ID 2342>. This protein is predicted to be unsaturated glucuronyl hydrolase. Analysis 
of this protein sequence reveals the following: 

Possible site: 48 

>>> Seems to have no N-terminal signal sequence 
20 INTEGRAL Likelihood = -0.11 Transmembrane 172 - 188 ( 172 - 188) 

Final Results 

bacterial membrane Certainty=0 . 1044 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

25 bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 



Query: 30 EFAIEICALKQLYINIDYFGEEYPTPATFNNIYKVTO3OTEWTNGFWTGCLWLAYEYNQDKK 89 

++A+ ++ NI F +P + Y++ +N EWTNGFW+G LWL YEY D 

Sbjct: 4 KQAMTDVAEKTLTNIKRFNGRFPHVSEDGEHYEIjNlSINNEWTNGFWSGILWLCYEYTNDP^ 63 

35 Query: 90 LKNIAHKITOjSFLNRINNRIALDHHDLGFLYTPSCTAEYRINGDVKALEATIKAADKLME 149 

+ A V SF R+ + LDHHD+GFLY+ S A++ I D +A + TI+AAD LM+ 
Sbjct: 64 FRQAAASTVRSFQQRMEQNLELDHHDIGFLYSLSSKAQWIIERDERAKQLTIEAADVLMK 123 

Query: 150 RYQEKGGFIQAWGELG-YKEHYRLIIDCLLNIQLLFFAYEQTGDEKYRQVAVNHFYASAN 208 
40 R++EK QAWG G R+I+DCL+N+ LLF+A E TG+ YR+ A+ H + 

Sbjct: 124 RWREKIELFQAWGPEGDLSNGGRI IVDCLMNLPLLFWASEVTGNPDYREAAI IHADKTRR 183 

Query: 209 NWRDDSSAFHTFYFDPETGEPLKGVTRQ3YSDES SWARGQAWGI YGI PLSYRKMKDYQQ 268 
+VR D S +HTFYF+ ETGE L+G T QGY D S+W+RGQAW IYG ++YR + + 
45 Sbjct: 184 FIVRGDDSTYHTFYFNQETGEALRGGTHQGYEDGSTWSRGQAWAIYGFAIAYRYTGNERY 243 

Query: 269 I ILFKGMTNYFLNRLPEDKVSYWDLI FTDGSGQPRDTSATATAVCGIHEMLKYLPEVDPD 328 

+ K YF+ LP D V+YWD RD+SA+A A CGI E+L +L E DPD 

Sbjct: 244 LETAKRTAICfFIENLPADYVAYWDFNAPITPDTKRDSSASAIASCGILELLSHLQETDPD 303 

50 

Query: 329 KETYKYAMHTMLRSLIEQYSNNELIAGRPLLLHGWSWHSGKGVDEGNIWGDYYYLEALI 388 

K ++ ++ + SL+E Y++ + G L+ G YS G D+ IWGDY+Y EAL+ 
Sbjct: 304 KAFFQQSVQKQMTSLVENYASEKDAQG- -LIKRGSYSVRIGHAPDDYVIWGDYFYTEALM 361 

55 Query: 389 RFYKDWELYW 398 

R K YW 
Sbjct: 3 62 RLEKLRNGYW 371 



A related DNA sequence was identified in S.pyogenes <SEQ ID 2343> which encodes the amino acid 
60 sequence <SEQ ID 2344>. Analysis of this protein sequence reveals the following: 
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Possible site: 33 
>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -0.37 Transmembrane 173 - 189 { 173 - 189) 

5 

Final Results 

bacterial membrane Certainty=0 .1150 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < suco 

10 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 273/395 (69%) , Positives = 336/395 (84%) 

Query: 4 IKPVKVESIENPKRFLNSRLLTKIEVEFAIZKALKQLYINIDYFGEEYPTPATFNNIYKV 63 
15 +K + +E 1+ P+RF L++ ++ +A++ ALKQ+ +N+DYF E++PTPAT 4-N Y + 

Sbjct: 5 LKTIALEPIKQPERFTKEDFLEQEDITQALDIjALKQWLNMDYFKEDFPTPATKDNQYAI 64 

Query: 64 MDNTEWTNGFWTGCLWIAYEYNQDKKLKNIAHKNVLSFIjNRINNRIALDHHDLGFLYTPS 123 
MDNTEWTN FWTGCLW1AYEY+ D +K +A N LSFL+R+ I LDHHDLGFLYTPS 
20 Sbjct: 65 MDOTEWTNAFWTGCLWLAYEYSGDDAIKALAQANDLSFLDRVTRDIELDHHDLGFLYTPE 124 

Query: 124 CTAEYRINGDVKALEATIICAADKLMERYQ3KGGFIQAWGELGYKEHYRLIIDCLLNIQLL 183 

C AE+++ ++ EA +KAADKL++RYQ+KGGFIQAWGELG KE YRLI IDCLLNIQLL 
Sbjct: 125 CMAEWKLLKTPESREAALKAADKLVQRYQDKGGFIQAWGELGKKEDYRLIIDCLLNIQLL 184 

25 

Query: 184 FFAYEQTGDEKYRQVAVNHFYASANNVVRDDSSAFHTFYFDPETGEPLICGVTRQGYSDES 243 

FFA ++TGD +YR +A+NHFYASAN+V+RDD+SA+HTFYFDPETG+P+KGVTRQGYSD+S 
Sbjct: 185 FFASQETGDNRYRDMAINHFYASANHVIRDDASAYHTFYFDPETGDPVKGVTRQGYSDDS 244 

30 Query: 244 SWARGQAWGIYGIPLSYRKMKDYQQIILFKGMTNYFLNRLPEDKVSYWDLIFTDGSGQPR 303 

+WARGQAWGIYGI PL+YR +K+ + I LFKGMT+YFLNRLP+D+VSYWDLIF DGS Q R 
Sbjct: 245 AWARGQAWGIYGIPLTYRFLKEPELIQLFKGMTHYFLNRLPKDQVSYWDLIFGDGSEQSR 304 

Query: 304 DTSATATAVCGIHEMLKYLPEVDPDKETYKYAMHTMLRSLIEQYSNNELIAGRPLLIiHGV 363 
35 D+SATA AVCGIHEMLK LP+ DPDK+TY+ AMH+MLR+LI+ Y+N +L G PLLLHGV 

Sbjct: 305 DSSATAIAVCGIHEMLKTLPDHDPDKKTYEAAMHSMLRALIKDYANKDLKPGAPLLLHGV 364 

Query: 364 YSWHSGKGVDEGNIWGDYYYLEALIRFYKDWELYW 398 
YSWHSGKGVDEGNIWGDYYYLEAL+RFYKDW YW 
40 Sbjct: 365 YSWHSGKGVDEGNIWGDYYYLEALLRFYKDWNPYW 399 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 764 

45 A DNA sequence (GBSx0812) was identified in S.agalactiae <SEQ ID 2345> which encodes the amino 
acid sequence <SEQ ID 2346>. Analysis of this protein sequence reveals the following: 
Possible site: 36 

»> Seems to have no N-terminal signal sequence 

50 Final Results 

bacterial cytoplasm Certainty=0. 3035 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

55 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC44679 GB:U65015 PTS permease for mannose subunit IIIMan C 
terminal domain [Vibrio furnissii] 
Identities = 63/125 (50%) , Positives = 89/125 (70%) , Gaps = 1/125 (0%) 

60 Query: 5 PNIV^RvDERLIHGQ-GQLWKFLSCJNTVTWANDDVSKDHLQQTLMKTWPESIALRFF 63 

PNIV++R+DERL+HGQ G WV F N V+VMTO+V+ D +QQ LM+ V+ + IA+RF+ 
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Sbjct: 2 PNIVLSRIDERLVHGQVGVQWGFADi^IVVVANDEWiADTIQQNLMEMVLADGIIAIRFW 61 

Query: 64 DIQICVIDIIHKANPAQTIFIIVKDU03VYRLVAG6VPIKEINIGNIHNGESKEQVSRSIF 123 

+QK ID IHKA+ QlttK D RLV GGVPI IN+GN+H +GK Q+S+++ 
Sbjct: 62 TVQKTIDTIHKASDRQRILLVCKTPHDFP.RLVEGGVPIAAINVGNMHYIDGKTQISKTVS 121 

Query: 124 LGMKD 128 
+ +D 

Sbjct: 122 VDAED 126 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2347> which encodes the amino acid 
sequence <SEQ ID 2348>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2511 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

s>GP:BAA84216 GB:AB019619 unsaturated glucuronyl hydrolase [Bacillus 
sp. GLl] 

Identities = 161/369 (43%), Positives = 220/369 (58%), Gaps = 1/369 (0%) 
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Sb j Ct : 






92 


Sbjct: 
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152 


Sb j ct : 


124 


Query: 


211 


Sbjct: 


184 
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271 


Sbjct: 


244 
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304 




391 


Sbjct: 


364 



SF +R+ R LDHHD+GFLY+ S A+W + K +R+ AL AAD L++R 



++ G IQAWG G E+ R+IIDCLLN+ LL +A ++TGD YR +A H 



+4R D S+YHTFYFDPE G+ ++G T QG +D S V 



K M +FL R+P+D V YWD RDSSA+AI CG+ E+ 



D+ IWGDYYYLEALLR 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 112/160 (70%), Positives = 132/160 (82%), Gaps = 1/160 (0%) 

Query: 5 PNIVMTRVDERLIHGC^QLWKPLSCWWIVANDDVSKDHLQQTLMKTVVPESIALRFFD 64 

PNI+MTRVDERLIHGQGQLWVKFL+CNTVIVAND VS+D +QQ+LMKTV+P SIA+RFF 
Sbjct: 4 PNIIMTRVDERLIHGQGQLVIVKFLNCKTVIVANDAVSEDKIQQSLMKTVIPSSIAIRFFS 63 

Query: 55 IQKVIDIIHKANPAQTIFIIVKDLKDVYRLVAGGVPIKEINIGNIHNGEGKEQVSRSIFL 124 

IQKVIDIIHKA+PAQ+IFI+VKDL+D LV GGVPI EINIGNIH + K +++ I L 
Sbjct: 64 IQKVIDIIHKASPAQSIFIWKDLQDAKLLVEGGVPITEINIGNIHKTDDKVAITQFISL 123 
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Query: 125 GMKDKEIIRKLNQEYHIAFOTK3TPTGNDGAVEVNILDYI 164 

G DK IR L ++H+ FNTKTTP GN A +V+ILDYI 
Sbjct: 124 GETDKSAI RCLAHDHHWFNTKTTPAGN- SASDVDILDYI 162 

5 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 765 

A DNA sequence (GBSx0813) was identified in S.agalactiae <SEQ ID 2349> which encodes the amino 
acid sequence <SEQ ID 2350>. This protein is predicted to be AgaW (agaC). Analysis of this protein 
1 0 sequence reveals the following: 



Possible site: 25 



Seems to 


have a cleavable N-term signal seq. 
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Final Results 

20 bacterial membrane Certainty=0 .3781 (Affirmative) c succ: 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

25 >GP:AAF81084 GB:AF228498 AgaW (Escherichia coli] 

Identities = 93/295 (31%) , Positives = 140/295 (46%) , Gaps = 48/295 (16%) 

Query: 1 MDISILQAVLIGLWTAFCFSGMLLGL-YTNRCIVIiSr^VGVILGDIQTALAVGAISELAY 59 
M+IS+LQA +G+ M GL + +R +VL VG++LGD+ T + G EL + 

30 Sbjct: 1 MEISLLQAFALGIIAFIAGLDMFNGLTHMHRPWLGPLVGLVLGDLHTGILTGGTLELVW 60 



Query: 60 MGFGVGAGGTVPPNPIGPGIFGTLMAITTAGTKGKITPEAALALSTPIAVGIQFLQTATY 119 

MG AG PPN I I GT AITT + P+ A+ ++ P AV +Q T + 

Sbjct: 61 MGLAPLAGAQ- PPNVI IGTIVGTAFAITTG VKPDVAVGVAVPFAVAVQMGITFLF 114 

35 

Query: 120 TAFAGAPETAKK ALQAGNFRGFXI AANGT ~ IWAFAGLGFGLGVLGALSTQTL 170 

+ +G + AL A N+ N + AF + FG A +T+ 

Sbjct: 115 SVMSGVMSRCARMPRTPIIAAI^ACNYLALLALGNFYFLCAFLPIYFG AEHAKTI 169 

40 Query: 171 TDLFALIPPVLLNGLTLAGKMLPAIGFAMILSVMAKKELIPYILLGYVLAvYFGLPVLTP 230 

D+ +P L++GL +AG ++PAIGFA++L +M K IPY +LG+V A + LPVL 
Sbjct: 170 IDV---LPQRLIDGLGVAGGIMPAIGFAVLLKIMMKNVYIPYFILGFVAAAWLKLPVL-- 224 



Query: 231 TANGDG VLTSVATNSVLGVPT IGVAI I AT I FALLD I FRKPAAPTKETKTEGDNQD 285 

+A A AL+D+ RK PT+ + + +D 
Sbjct: 225 -- AIACPALAMALIBLLRKSPEPTQPAAQKEEFED 257 



A related DNA sequence was identified in S.pyogenes <SEQ ID 235 1> which encodes the amino acid 
sequence <SEQ ID 2352>. Analysis of this protein sequence reveals the following: 

50 Possible site: 52 

»> Seems to have a cleavable N-term signal seq. 
INTEGRAL Likelihood = -6.37 Transmembrane 
INTEGRAL Likelihood = -5.10 Transmembrane 
INTEGRAL Likelihood = -1.59 Transmembrane 

55 

Final Results 

bacterial membrane Certainty=0 .3548 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) <; suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



220 - 236 ( 214 - 241) 
14S - 162 ( 144 - 165) 
184 - 200 ( 184 - 202) 
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The protein has homology with the following sequences in the databases: 

>GP:AAC44680 GB:U65015 PTS permease for mannose subunit IIPMan 
[Vibrio furnissii] 

Identities = 85/255 (33%) , Positives = 137/255 (53%) , Gaps = 11/255 (4%) 

Query: 1 MDINLLQALLIGLWTAFCFSGMLLGI-YTNRCIII1SFGVGIILGDLPTALSMGAISEI1AY 59 

M+I h QAL++GL + G+ + +R ++L VG+ILGDL T + +G EL + 

Sbjct: 1 MEIGLFQALMLGLLAFLAGLDLFNGLTHFHRPWI.GPLVGLILGDLHTGILVGGTLELIW 60 

Query: 60 MGFGVGAGGTVPPNPIGPGIFGTLMAITSAGKVTPEAALALSTPIAVAIQFLQTFAYTAF 119 

MG AG PPN I I GT AIT+ VP A+ 4-+ P AVA+Q T +4A 
Sbjct: 61 MGLAPLAGAQ - PPNVI IGTIVGTTFAITT- -NVEPNVAVGVAVPFAVAVQMGITLLFSAM 117 

Query: 120 AGAPETAKKQLQKGNIRGFK FAANGTIWAFAFIGLGLGLLGALSMDTLLHLVDYIPP 176 

+ + + + RG + + A + +F F+ L + L D 4-V +P 

Sbjct: 118 SAVMSKCDEYAKNADTRGIERVJ^FAIAVL3SFYFLCAFLPIY--LGADHAGAMVAALPK 175 

Query: 177 VLI^GLTVAGKMLPAIGFAMILSVMAKKELIPFVLIGYVCAAYLQIPTIGIAIIGIIFAL 236 

L++GL VAG ++PAIGFA+++ +M K IP+ ++G+V AA+LQ+P +1 A+ 
Sbjct: 176 ALIDGLGVAGGIMPAIGFAVLMKII^MKNAYIPYFILGFVAAAWLQLPILAIRCARTAMAI 235 



Query: 237 
25 Sbjct: 236 



NEFYNK- -PKQVDAT 249 

+F K P V+A+ 
IDFMRKSEPTPVNAS 250 



An alignment of the GAS and GBS proteins is shown below. 

Identities = 203/288 (70%), Positives = 225/288 (77%), Gaps = 28/288 (9%) 

Query: 1 MDISILQAVLIGLWTAFCFSGMLI^LYTNRCIVLSLGVGVILGDIQTALAVGAISEIAYM 60 

MDI++LQA+LIGLWTAFCFSGMLLG+Y1WRCI+LS GVG+ILGD+ TAL++GAISELAYM 
Sbjct: 1 MDINLLQADLIGLWTAFCFSGMLLGIYTNRCIILSFGVGIILGDLPTALSMGAISELAYM 60 

Query: 61 GFGVGAGGTVPPNPIGPGIFGTLMAITTAGTKGKITPEAALALSTPIAVGIQFLQTATYT 120 

GFGVGAGGTVPPNPIGPGIFGTLMAIT+AG K+TPEAALALSTPIAV IQFLQT YT 
Sbjct: 61 GFGVGAGGTVPPNPIGPGIFGTLMAITSAG KVTPEAALALSTPIAVAIQFLQTFAYT 117 

Query: 121 AFAGAPETAKKALQAGNFRGFKIAANGTIWAFAGLGFGLGVLGALSTQTLTDLFRLIPPV 180 

AFAGAPETAKK LQ GN RGFK AANGTIWAFA +G GLG+LGALS TL L IPPV 
Sbjct: 118 AFAGAPETAKKQLQKGNIRGFKFAANGTIWAFAFIGLGLGLLGALSMDTLLHLVDYIPPV 177 

Query: 181 LLNGLTLAGfCMLPAIGFAMILSVMAKKELIPYILLGYVLAVYFGLPVLTPTANGDGVLTS 240 

LLNGLT+AGKMLPAIGFAMILSVMAKKELIP++L+GYV A Y 
Sbjct: 178 LLNGLTVAGKMLPAIGFAMILSVMAKKELIPFVLIGYVCAAY 219 

Query: 241 VATNSVLGVPT IGVAI IATI FALLD I FRKPAAPTKETKTEGDNQDDWI 28B 

L +PTIG+AII IFAL + + KP T +G QDDWI 

Sbjct: 220 LQIPTIGIAIIGI IFALNEFYNKP - KQVDATTVQG3QQDDWI 260 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antige 
vaccines or diagnostics. 



Example 766 

A DNA sequence (GBSx0814) was identified in S.agalactiae <SEQ ID 2353> which encodes the amino 
acid sequence <SEQ ID 2354>. Analysis of this protein sequence reveals the following: 

55 Possible site: 31 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2442 (Affirmative) < succ> 

60 bacterial membrane Certainty=0. 0000 (Not Clear) < eaco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 
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The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
5 vaccines or diagnostics. 

Example 767 

A DNA sequence (GBSx0815) was identified in S.agalactiae <SEQ ID 2355> which encodes the amino 
acid sequence <SEQ ID 2356>. This protein is predicted to be PTS permease for mannose subunit IIBMan. 
Analysis of this protein sequence reveals the following: 

10 Possible site: 43 

>>> Seems to have no N-terminal signal sequence 
INTEGRAL Likelihood = -8.28 Transmembrane 
INTEGRAL Likelihood = -3.45 Transmembrane 
INTEGRAL Likelihood = -1.59 Transmembrane 

15 

Final Results 

bacterial membrane Certainty=0 .4312 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

20 

A related GBS nucleic acid sequence <SEQ ID 8657> which encodes amino acid sequence <SEQ ID 8658> 

was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 
McG: Discrim Score: -9.70 
25 GvH: Signal Score (-7.5): -6.12 

Possible site: 19 
>>> Seems to have no N-terminal sd 
ALOM program count: 3 value: -E 
INTEGRAL Likelihood = -8.28 
30 INTEGRAL Likelihood = -3.45 

INTEGRAL Likelihood = -1.59 
PERIPHERAL Likelihood = 0.37 
modified ALOM score : 2.16 

35 *** Reasoning Step: .3 

Final Results 

bacterial membrane 
bacterial outside 
40 bacterial cytoplasm 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAA57943 GB:U18997 ORF_o290; Geneplot suggests frameshift 
linking to o267, not found [Escherichia coli] 
45 Identities = 101/278 (36%) , Positives = 164/278 (58%) , Gaps = 6/278 (2%) 

Query: 17 LRQKETTKMTGSKKLAKSDYTKTALPAFYLQ^IGFNYSNYQGLGYANVIYPALKKyYGDDK 76 

++ K+ T GS+ ++K D T+ R+ LQ FNY Q G+ + P LKK Y DDK 
Sbjct: 19 VIMKKRTTAMGSE-ISKKDITRLGFRSSLLQASFNYEM'lQAGGFraAMLPILKKIYKDDK 77 

50 

Query: 77 KAIAGALEENVEFYNTNPHFLPFvTSLHLAMLDNEREEEEIRGIKMALMGPLAGIGDSLS 136 

L+ A+++N+EF NT+P+ + F+ L ++M + + I+G+K+AL GP+AGIGD++ 

Sbjct: 78 PGLSAAMKDIttEFIOTHPNLVGFLMGLLISMEEKGEKRDTIKGLKVALFGPIAGIGDAIF 137 

55 Query: 137 QFCLAPLFSTIAASLATDGLVMGPILFFVAMNTILTGIKLVTGMYGYRLGTSFIDKLSEQ 196 

F L P+ + I +S A+ G ++GPILFF A+ ++ +++ GY +G IDK+ E 

Sbjct: 138 WFTLLPI^GICSSFASQGNLLGPILFF-AVYIilFFLRVGWTHVGYSVGVKAIDKVREN 196 



278 - 294 ( 272 - 294) 
155 - 171 ( 155 - 174) 
250 - 256 ( 250 - 267) 



3.28 threshold: 0.0 

Transmembrane 254 - 270 ( 248 - 270) 

Transmembrane 131 - 147 ( 131 - 150) 

Transmembrane 226 - 242 ( 226 - 243) 
175 



Certainty=0. 4312 (Affirmative! 

Certainty=0. 0000 (Not Clear) ■ 

--- Certainty=0. 0000 (Not Clear) ■ 



WO 02/34771 



-868- 



PCT/GB01/04789 



Query: 197 MSVISRAANIVGVTVISSLAATQVKLTIPYTFAPEKVTSTTQKIVTVQGMLDKIAPALLP 256 

+I+R+A I+G+TVI L A+ V + + +FA T + Q DK+ P +LP 

Sbjct: 197 SQMIARSATILGITVIGGLIASYVHINWTSFA IDNTHSVALQQDFFDKVFPNILP 252 

5 

Query: 257 ALYTFLMFYLIKNKKWTTYKLVILTVIIGILGSWLGIIi 294 

YT LM+Y ++ KK L+ +T ++' H- S GIL 

Sbjct: 253 MAYTLLMYYFLRVKKAHPVLLIGVTFVLSIVCSAFGIL 290 

10 A related DNA sequence was identified in S.pyogenes <SEQ ID 2357> which encodes the £ 
sequence <SEQ ID 2358>. Analysis of this protein sequence reveals the following: 

Possible site: 45 
>>> Seems to have no N- terminal signal sequence 

INTEGRAL Likelihood = -8.49 Transmembrane 276 - 292 ( 270 - 292) 

15 INTEGRAL Likelihood = -7.01 Transmembrane 151 - 167 { 149 - 176) 

INTEGRAL Likelihood = -3.03 Transmembrane 202 - 218 ( 202 - 220) 

INTEGRAL Likelihood = -2.13 Transmembrane 249 - 265 ( 248 - 265) 



Final Results 

bacterial membrane Certainty=0 .4397 (Affirmative) < suco 

bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm — Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the databases: 

>GP:AAA57943 GB:U18997 0RF_o290; Geneplot suggests frameshift 
linking to o267, not found [Escherichia coli] 
Identities = 104/285 (36%) , Positives = 162/285 (56%) , Gaps = 7/285 (2%) 

Query: 8 NKSMQQLSKEANKMTGSNKLTKKDYLKTALRAFFLQNGFNYNNYC^IGYANVIYPALKKH 67 

N+S + + ++++KKD + R+ LQ FNY Q G+ + P LKK 

Sbjct: 13 NRSPLPVKMKKRTTAMGSEISKKDITRLGFRSSLLQASFNYERMCAGGFTWAMLPILKKI 72 



Query: 128 GDSLSQFCLAPLFSTIAASLASDGLVLGPILFFLAMNIILTAIKIGSGLYGYKVGTSFID 187 

GD++ F L P+ + I +S AS G +LGPILFF A+ +++ +++G GY VG ID 
Sbjct: 133 GDAIFWFTLLPIMAGICSSFASQGNLLGPILFF-AVYLLIFFLRVGWTHVGYSVGVKAID 191 

Query: 188 KLSEQ^WSRMANIVGVTVIAGIjAATSWITVPITFA^^ 247 

K+ E +++R A I+G+TVI GL A+ V I V +FA + Q F DK 
Sbjct: 192 KVRENSQMIARSATILGITVIGGLIASYVHINWTSFAIDNTHSVALQQDF FDK 245 

Query: 248 IAPALLPALFTLLMYYLIKNKKWTTYKLVILTVIIGVIGSWLGIL 292 

+ P +LP +TLLMYY ++ KK L+ +T ++ ++ S GIL 

Sbjct: 246 VFPNILPMAYTLLMYYET^VKKAHPVLLIGVTFVLSIVCSAFGIL 290 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 224/288 (77%) , Positives = 255/288 (87%) , Gaps = 4/288 (1%) 

Query: 12 HLLKKLRQ- -KETTKMTGSKKLAKSDYTKTALRAFYLQNGFNYSNYQGLGYANVIYPALK 69 

+L K ++Q KE KMTGS KL K DY KTALRAF+LQNGFNY+NYQG+GYANVIYPALK 
Sbjct: 6 NLNKSMQQLSKEANKMTGSNKLTKKDYLKTALRAFFLQNGFNYNNYQGIGYANVIYPALK 65 

Query: 70 KYYGDDKKAIjAGALEENVEFYNTNPHFLPFWSLHIjAMLDNERPEEEIRGIKMALMGPLA 129 

K++G+DKK L ALE+N EFYNTNPHFLPF+TSLHL ML+N RPEEE R IKMALMGPLA 
Sbjct: 66 KHFGNDKKGLYQALEDNCEFYNTNPHFLPFITSLHLVMLENNRPEEETRNIKMALMGPLA 125 

Query: 130 GIGDSLSQFCLAPLFSTIAASLATDGLVMGPILFFVAMNTILTGIKLVTGMYGYRLGTSF 189 

GIGDSLSQFCLAPLFSTIAASLA+DGLV+GPILFF+AMN ILT IK+ +G+YGY++GTSF 
Sbjct: 126 GIGDSLSQFCLAPLFSTIAASLASDGLVLGPILFFLAMNIILTAIKIGSGLYGYKVGTSF 185 

Query: 190 I DKLSEQMSVI SRAANIVGVTVISSLAATQVKLTI PYTFAPEKV- - TSTTQKI VTVQGML 247 
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IDKLSEQM+V+SR ANIVGVTVI+ LAAT VK+T+P TFA KV +T QK VT+QGML 
Sbjct: 186 IDKLSEQMAWSRMANIVGVTVIAGLAATSWITVPITFJV^GKVDAANTAQKPVTIQGML 245 

Query: 248 DKIAPALLPALYTFLMFYLIKNKKWTTYKLVILTVIIGILGSWLGILA 295 
5 DKIAPALLPAL+T LM+YLI KNKKWTTYKLVILTVT IG+ +GSWLGILA 

Sbjct: 246 DKIAPALLPALFTLLMYYLIKNKKWTTYKLVILTVIIGVIGSWLGILA 293 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

10 Example 768 

A DNA sequence (GBSx0816) was identified in S.agalactiae <SEQ ID 2359> which encodes the amino 
acid sequence <SEQ ID 2360>. Analysis of this protein sequence reveals the following: 

Possible site: 58 

>» Seems to have no N-terminal signal sequence 
15 INTEGRAL Likelihood = -0.37 Transmembrane 135 - 151 ( 135 - 151) 

Final Results 

bacterial membrane Certainty=0 . 1150 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

20 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB01924 GB:Z79691 OrfA [Streptococcus pneumoniae] 
Identities = 76/206 (36%), Positives = 124/206 (59%), Gaps - 1/206 (0%) 

25 

Query: 428 SWTYNSYPKCDYCQLTSKDRYHLVEGQLHVQRASDIYYHKRWLIjTLPQAITLVIDKVSCP 487 

SW Y YP +C ++ H +EG Y HKR +L L + + L++D + C 

Sbjct: 2 SWEYEYYPHSLFCHHKEREGMHYIEGAYWSAEPDLPYLHKRKILMLVEDVWLLVDDIRCQ 61 

30 Query: 488 GEHVLTNQYILDDQVIYENGFVNDLKLVSPTTFNLEDCLISKRYNQLTESHKLVKKIKFV 547 

G+H Q+ILD V Y++G +N L+L S F+LED +IS +YN+L S KL K+ F 
Sbjct: 62 GQHEALTQFILDKDVTYQDGKINQLRLWSEVDFDLEDTIISPKYNELERSSKLTKRQFFE 121 

Query: 548 DEVMDYTLIVDRNCQVKYVPLVQTNSHKELSNSIAFDIRSQDFHYLIGVLMDDIIFGDKL 607 
35 ++++DYT+I + ++ + QT+ +E+ N++AF++++ + LI +L +DI G+KL 

Sbjct: 122 NQMLDYTIIAHESFEIIRHSVYQTDD-REVENALAFEVKNDETDKLILLLSEDIRVGEKL 180 



Query: 608 YLMQGIKCKGKVIVYDKNNGKMSRLK 633 
L+ G K +GK +VYDK N +M RL+ 
40 Sbjct: 181 CLVDGTKMRGKCLVYDKINERMIRLQ 206 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2361> which encodes the amino acid 
sequence <SEQ ID 2362>. Analysis of this protein sequence reveals the following: 

Possible site: 53 
45 . »> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -2.55 Transmembrane 477 - 493 ( 477 - 493) 



Final Results 

bacterial membrane Certainty=0. 2020 (Affirmative) < suco 

50 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm --- Certainty=0. 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the 



Query: 434 SWAYLSYPKSNYCHLRQNGHVYFIEGSYQTQFSDRNNYQHDRQILILPPGIFLIIDTIQA 493 

SW Y YP S +CH ++ +++IEG+Y + D Y H R+IL+L ++L++D 1+ 
Sbjct: 2 SWEYEYYPHSLFCHHKEREGMHYIEGAYWSAEPDLP-YLHKRKILMLVEDVWLLVDDIRC 60 
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Query: 494 QGNHCLVSQFILDNHLDVKTDHLSDLRLISDCPFTIEETILSKKYNQY1TSHKLIKRKPF 553 

QG H ++QFILD + + ++ LRL S+ F +E+TI+S KYN+ S KL KR+ F 
Sbjct: 61 QGQHEALTQFILDKDVTYQDGKINQLRLWSEVDFDLEDTIISPICiTNELERSSKLTKRQFF 120 

Query: 554 KDKGCTSTLLVPDDTKVTPLTPLQTGKPJNPIETALSWHLKGKQFDYSICVLQEDLIKGEK 613 

+++ T++ + ++ + QT R +E AL++ +K + D I +L ED+ GEK 
Sbjct: 121 ENQMLDYTIIAHESFEIIRHSVYQTDDRE-VENALAFEVKIIDETDKlilLLLSEDIRVGEK 179 

Query: 614 LVLLNSHKIRGIWWINHITNEIIRLK 640 

L L++ K+RGK +V + I +IRL+ 
Sbjct: 180 LCLVDGTKMRGKCLVYDKINERMIRLQ 206 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 282/631 (44%) , Positives = 414/631 (64%) , Gaps = 2/631 (0%) 

Query: 6 YNKFKD-FDREFCQICYIKTYQSNAYQEMJa^SvT^MMRNTFVFM)NTOMEPCSKAYCLDPL 64 

+ +FK+ + +FC+ Y+ YQ+++Y + K +L++ NTF+F DNWDMEPC Y LDP+ 
Sbjct: 11 FARFKETVNPDFCRNYLLDYQTDSYADQKRIADLLLTNTFLFEDNWDMEPCHIPYHLDPI 70 

Query: 65 EWDKPvTDDPEWLYMLNRQTYLFKFLWYIVEGDKSYLRQMKYFMYHWIDCQFTLKPEGA 124 

W + V DDPEW +MLNRQTYL K ++VY+VE D+ YL K F+ +WI+ L P+G 
Sbjct: 71 TWQFAVIDDPEWNFMLNRQTYLQKLILVYLVERDERYLLTAKGFILNWIESAIPLDPKGL 130 

Query: 125 VSRTIDTGIRCMSWLKVLIFLDYFGLITETKKIKLLTSLREQITYMRDYYREKDSLSNWG 1B4 

+RT+DTGIRC +W+K LI+L+ F +T+ ++ +L SL +Q+ ++ Y +K SLSNWG 
Sbjct: 131 ATRTLDTGIRCFAOTKCLIYr^FNALTKQEESLILASLEKQLQFLHANyLDKYSLSNWG 190 

Query: 185 ILQTTAILACLYYYEDELNLPEIQSFAEEELLLQIKLQILDDGSQYEQSIMYHVEVLKSL 244 

ILQTTAIL Y+ +FA +EL QI LQIL+DGSQ+EQS MYHVEVLK+L 

Sbjct: 191 ILQTTAILLADAYFGSDLDIAAATAFARKELTQQIALQILEDGSQFEQSTMYHVEVLKAL 250 

Query: 245 MELVII^PKYYLPLEETIEKMvTYIiIAOTGPDYCQIAIGDSDVTDTRDILTLATLVLKSS 304 

+EL L P Y L T+ M YL+ MTGPD+ Q+ +GDSDVTDTRDILTLA +L+ 
Sbjct: 251 LELTALVPDYLPQLRPTLIAMSDYLLKMTGPDHKQIPLGDSDVTDTRDILTLAATILEEP 310 

Query: 305 KTKSFSFDNVNLETLLLFGKPSIYLFEEIPRATIGESAYLFPDSGHVCLRDDRRYIFFKN 364 

K+ +F ++4++LLL G+ ++ FE++P T+ A+ F SGH+ + + Y+FFKN 
Sbjct: 311 HLKAAAFPTLDIDSLLLLGEKGVHTFEQLPVQTLPTFAHHFEHSGHITINQENYYLFFKN 370 

Query: 365 GPFGSAHTHSDNKSVCLYDKKKPIFIDAGRYTYKEEQLRYDFKRSTSHSTCTLDGQPLEM 424 

GP GS+HTHSD NS+CLY K +P+F DAGRYTYKEE LRY K ++ HST L+ Q E 
Sbjct: 371 GPIGSSHTHSDQNSLCLYYKGQPLFCDAGRYTYKEEPLRYALKSASHHSTAFLEEQLPEQ 430 

Query: 425 IKDSWTYNSYPKCDYCQLTSKDRYHLVEGQLHVQRAS-DIYYHKRWLLTLPQAITLVIDK 483 

I SW Y SYPK +YC L + +EG Q + + YHR+LLP I L+1D 

Sbjct: 431 IDSSWAYLSyPKSNYCHLRQNGHVYFIEGSYQTQFSDRKNYQHDRQILILPPGIFLIIDT 490 

Query: 484 VSCPGEHVLTNQYILDDQVIYENGFVNDLKLVSPTTFNLSDCLISKRYNQLTESHKLVKK 543 

+ G H L +Q+ILD+ + + ++DL+L+S F +E+ ++SK+YNQ SHICL+K+ 
Sbjct: 491 IQAQGNHCLVSQFILDNHLDVKTDHLSDLRLISDCPFTIEETILSKKYNQYLTSHKLIKR 550 

Query: 544 IKFVDEVMDYTLIVDRNCQVICYVPLVQTNSHKELSNSIAFDIRSQDFHYLIGVLMDDIIF 603 

F D+ TD+V + +V + +QT + ++++ ++ + F Y I VL +D+I 

Sbjct: 551 KPFKDKGCTSTLLVPDDTKVTPLTPLQTGKRNPIETALSWHLKGKQFDYSICVLQEDLIK 610 

Query: 604 GDKLYLMQGIKCKGKVIVYDKNNGKMSRLKN 634 

G+KL L+ K +GKV+V + ++ RLK+ 
Sbjct: 611 GEKLVLLNSHKIRGKVWINHITNEIIRLKH 641 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens f 
vaccines or diagnostics. 
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Example 769 

A DNA sequence (GBSx0817) was identified in S.agalactiae <SEQ ID 2363> which encodes the amino 
acid sequence <SEQ ID 2364>. This protein is predicted to be RegR (kdgR). Analysis of this protein 
sequence reveals the following: 

5 Possible site: 57 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 . 2545 (Affirmative) < suco 

10 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB01925 GB:Z79691 RegR [Streptococcus pneumoniae] 
Identities = 222/333 (66%) , Positives = 279/333 (83%) 



Query: 


1 


Sbjct: 


1 




61 


Sbjct: 


61 


Query: 


121 


Sbjct: 


121 




181 


Sbjct: 


181 




241 


Sbjct: 


241 




301 


Sbjct: 


301 



K+TKL+GVLIGDITN+FSNQIVKGIE I Q GYQ+++GNSNY +SE+ YIE+ML LGV 



DGFIIQPTSNFRKYSRI+ EKKK MVFFDSQLYEH+TSWVK NNYDAVYDMTQ C+ +GY 



- F++ITADTS LSTRIERASGF+DAL n 



E+TLVF PNCWALP+VFT +K LN+++P+VGL+GFDN EWT FSSP VST+VQP++EEG+ 



Q +ILI++IEG + 



A related DNA sequence was identified in S.pyogenes <SEQ ID 2365> which encodes the amino acid 
sequence <SEQ ID 2366>. Analysis of this protein sequence reveals the following: 

N- terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0. 2928 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 214/333 (64%) , Positives = 266/333 (79%) , Gaps = 2/333 (0%) 



Query: 1 MSKKMTINDIAQLSKTSKTTVSFFLNQKFEKMSDETRQRIQEVIDETGYRPSTIARSIjNS 60 

M +K+TI DIA+L+KTSKTTVSF+LN +F+KMS+ET+ EI E I T Y+PS ARSLN+ 
Sbjct: 13 MQRKOTIIODIAEI^KTSKTTVSFYI^GRFDKMSEETKNRISESIKATNYKPSIAARSIJIA 72 

Query: 61 KKTKLLGVLIGDITNTFSNQIVKGIEHITKQKGYQIIVGNSNYDAKSEEDYIENMLNLGV 120 

K TKL+GV+IGDITN+FSNQIVKGIE ++ GYQI I+GNSNYD E++ IE MLNLGV 
Sbjct: 73 KSTKLIGWIGDITNSFSNQIVKGIESKAQEFGYQI 1 IGNSNYDPSREDELIEKMENLGV 132 
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Query: 121 DGFIIQPTSNFRKySRILICEKKKPWFFDSQLYEHKTSVIVKANNYDAVYDMTQECLNRGY 180 

DGFI IQPTSNFRKYSRI+ KKK +VFFDSQLYEH+T+WVK NNYDAVYD Q+C+++GY 
Sbjct: 133 DGFIIQPTSNFRKYSRIIDIKKKKVVFFDSQLYEHRTNWVKTNNYDAVYDTIQQCIDKGY 192 

Query: 181 KKFIMITADTSLLSTRIERASGFMDALKDNGFGYDTLVIEDDDHSKSDIEDFLKAWPDK 240 

+ FIMIT + +LLSTRIERASGF+D L+ N + ++I+++ S I Fit + K 
Sbjct: 193 EHFIMITGNPNLLSTRIERASGFIDVLEANHLTHQEMIIDEMQTSSEAIAQFLQGSLTKK 252 

Query: 241 EETLVFAPNCWALPMVFTAMKNLNFDMPRVGLVGFDNIEWTDFSSPKVSTIVQPAYEEGE 300 

+LVF PNCWALP VFTAMK+L F++P +GLVGFDNIEWT FSSP ++TI+QPAYEEGE 
Sbjct: 253 --SLVFVPNCWALPKVFTAMKSLKFNIPEIGLVGFDNIEWTKFSSPTLTTIIQPAYEEGE 310 

Query: 301 QVAQILINRIEGDDSVDNQQIVDCQMFWKESTF 333 

Q +ILI+ IEG QQI DCQ+ W+ESTF 

Sbjct: 311 QATKILIDDIEGHSQEAKQQIFDCQVNWQESTF 343 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 770 

A DNA sequence (GBSx0818) was identified in S.agalactiae <SEQ ID 2367> which encodes the amino 
acid sequence <SEQ ID 2368>. This protein is predicted to be polypeptide defromylase (def-1). Analysis of 
this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2339 (Affirmative) <: suco 

bacterial membrane — Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAC15392 GB:AJ278785 polypeptide deformylase [Streptococcus pneumoniae] 
Identities = 1S9/204 (82%), Positives = 192/204 (93%), Gaps = 1/204 (0%) 

Sbjct: 

Query: 61 r^AEKLGLRGGVGLAAPQLDISKRIIAVLVPNTODAQGNPPKEAYSLQEVMYNPKVVSHSV 120 

MAEK+GLRGGVGLAAPQLDISKRI IAVLVPN+ + +G P+EAY L+ +MYNPK+VSHSV 
Sbjct: 61 MAEKMGLRGGVGLAAPQLDISJtRIIAvX.VPNIVE-EGETPQEAYDLEAIMYNPKIVSHSV 119 

Query: 121 QDAALSDGEGCLSVDREVPGYWRHARVTIEYFDKTGEKHRLKLKGYNSIWQHEIDHID 180 

QDAAL +GEGCLSVDR VPGYWRHARVT++YFDK GEKHR+KLKGYNSIWQHEIDHI+ 
Sbjct: 120 QDAALGEGEGCLSVDRWPGYvVRHARVTVDYFDKDGEKHRIKLKGYNSIVVQHEIDHIN 179 



Query: 181 GIMFYDRINEKNPFAVKEGLLILE 204 

GIMFYDRINEK+PFAVK+GLLILE 
Sbjct: 180 GIMFYDRINEKDPFAVKDGLLILE 203 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2369> which encodes the amino acid 
sequence <SEQ ID 2370>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

• Final Results 

bacterial cytoplasm Certainty=0. 1745 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 
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An alignment of the GAS and GBS proteins is shown below. 

Identities = 160/204 (78%) , Positives = 186/204 (90%) 

Query: 1 MSAIDKLVKRSHLID^IIREGNPTLRKVAEEVTFPLSEKEEILGEKMMQFLKHSQDPI 60 

MSA DKL+K SHLI M+DI IREGNPTLR VA+EV+ PL +++ +LGEKMMQFLKHSQDP+ 
Sbjct: 1 MSAQDKLIKPSHLITMDDIIREGNPTLRAVAKEVSLPLCDEDILLGEKMMQFLKHSQDPV 60 

Query: SI MAEIOjGLRGGVGLAAPQLDISia^IIAVLVPlTTODAQGNPPKFAYSLQEvMYNPKVVSHSV 120 

MAEKLGLR GVGLAAPQ+D+SKRI IAVLVPN+ D +GNPPKEAYS QEV+YNPK+VSHSV 
Sbjct: 61 MAEKLGLRAGVGLAAPQIDVSKRIIAvIiVPlILPDKEGNPPKEAYSWQEvLYNPKIVSHSV 120 

Query: 121 QDAALSDGEGCLSvDREVPGYWRHARVTIEYFDKTGEKHRLKLKGYNSIWQHEIDHID ISO 

QDAALSDGEGCLSVDR V GYWRHARVT++Y+DK G+4HR+KLKGYN+IWQHEIDHI+ 
Sbjct: 121 QDAALSDGEGCLSVDRVVEGYVA/RHARvTVDYYDKEGQQHRIKLKGYNAIWQHEIDHIN 180 

Query: 181 GIMFYDRINEKNPFAVKEGLLILE 204 

G++FYDRIN KNPF KE LLIL+ 
Sbjct: 181 GVLFYDRINAKNPFETKEELLILD 204 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 771 

A DNA sequence (GBSx0819) was identified in S.agalactiae <SEQ ID 237 1> which encodes the amino 
acid sequence <SEQ ID 2372>. Analysis of this protein sequence reveals the following: 



Final Results 

bacterial cytoplasm Certainty=0 .3620 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 101 77> which encodes amino acid sequence <SEQ ID 
101 78> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC75224 GB:AE000305 putative transcriptional regulator 
[Escherichia coli K12] 
Identities = 58/191 (30%) , Positives = 98/191 (50%) 

Query: 37 DLQVITLTAGQSVCKQGEQLEYLHYIVKGRFKIVRRLFNGKEHILDIKTKPTLIGDIELL 96 

D ++ A + ++G+Q +L Y+ +GR ++ L NG+ ++-D P IG+IEL+ 
Sbjct: 17 DTRLFHFLARDYIVQEGQQPSWLFYLTRGRARLYATLANGRVSLIDFFAAPCFIGEIELI 76 

Query: 97 TNRQIVSSVIALEDLTVIQLSLKGRKEK^LTDATFLLKLSQEIAQAFHDQNIKASTNLGY 156 

+V A+E+ + L +K + LL D FL KL L+ + + + N + 
Sbjct: 77 DKDHEPRAVQAIEEOTCLALPMKHYRPLLIjMKLFIiRKL 136 

Query: 157 TVKELLASHILAIEEQGYFQLELSSLADSFGVSYRHLLRVIHDMVKEGLIQKEKPKYFIK 216 

+ LA+ IL +E + + + A+ GVSYRHLL V+ + +GL+ K K Y IK 
Sbjct: 137 PLVNRLAAFILLSQEGDLYHEKHTQAAEYLGVSYRHLLYVLAQFIHDGLLIKSKKGYLIK 196 

Query: 217 NRFALESLNIQ 227 

NR L L ++ 
Sbjct: 197 NRKQLSGLALE 207 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2373> which encodes the amino acid 
sequence <SEQ ID 2374>. Analysis of this protein sequence reveals the following: 
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Possible site: 27 
=■>> Seems to have r 



N- terminal signal sequence 



- Final Results 

bacterial cytoplasm - 

bacterial membrane • 

bacterial outside - 



- Certainty=0. 3809 (Affirmative) ■ 

- Certainty=0. 0000 (Not Clear) < i 

- Certainty=0. 0000 (Not Clear) < i 



An alignment of the GAS and GBS proteins is shown below. 

10 Identities = 23/53 (36%) , Positives = 35/63 (55%) , Gaps = 1/63 (1%) 

Query: 146 QNIKASTtILGYTVKELtASHILAIEEQGYFQLELSSL7ADSFGVSYRHLLRVIHDMVKEGL 205 

QN+ N+ YTVKE AS+ L + L L+ LA+ FG S RHL V+ + + + 

Sbjct: 3 QNV-CQQNITYWKERFASYTLEAQANQEVHLNLTLLANRFGTSDRHLKHVLKQPIFQR1 61 

15 

Query: 206 IQK 208 



20 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 772 

A DNA sequence (GBSx0820) was identified in S.agalactiae <SEQ ID 2375> which encodes the amino 
acid sequence <SEQ ID 2376>. Analysis of this protein sequence reveals the following: 



Possible site 
















>>> Seems to have a cleavable N-te 


tii signal seq. 










INTEGRAL 


Likelihood = 


-9.24 


Transmembrane 


163 
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Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



- Certainty=0. 4694 (Affirmative) < succ; 

- Certainty=0 . 0000 (Not Clear) < suco 
• Certainty=0 . 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 8659> which encodes amino acid sequence <SEQ ID 8660> 
was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 7 
McG: Discrim Score: -3.52 
GvH: Signal Score (-7.5): 0.340001 
Possible site: 25 
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INTEGRAL Likelihood = -1.81 Transmembrane 273 - 289 ( 272 - 291) 
PERIPHERAL Likelihood = 3.45 193 
modified ALOM score: 2.35 

*** Reasoning Step: 3 

Final Results 

bacterial membrane --- Certainty=0 .4694 (Affirmative) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB50057 GB:AJ248286 TRANSPORT PROTEIN, permease [Pyrococcus abyssi] 
Identities = 94/382 (24%) , Positives = 173/382 (44%) , Saps = 30/382 (7%) 

Query: 5 MEKLSLLSL-SLILLSTFSTSPALPQMISYY-RDKGLPSPQVELLFSIPSMAIIFILLIT 62 

MEKL +L h SL + +S A+P + +D G+ + ++ LL + + I + 

Sbjct: 1 MEKLIILILISLGWIFNYSHRMAVPSLAPIIMKDLGINNAEIGLLMTSLLLPYSLIQVPA 60 

Query: 63 PWLSKKLSEKHMIIFGLLLTALGGGLPWSQNVLLVFVSRLLLGSGIGFINTRAISVISE 122 

++ K+ K ++ +L +L L V++++Y + R L G G A ++ISE 

Sbjct: 61 GYIGDKIGRKKLLTISILGYSLSSALIVIiTRDYiroLVTVRALYGFFAGLYYAPATALISE 120 

Query: 123 YYQGKERRKLLGLRGSFEVLGNA GLTAL- -VGLLLTFGWSKSFMIYFLALPILVLYL 177 

++ ++ L F ++G A G+T L V + LT W +F++ + 1+ + L 

Sbjct: 121 LFRERKGSAL GFFMVGPAIGSGITPLIWPVALTLSWRYAFLVLSIMSSIVGILL 175 

Query: 178 VFAPKKWKDTNDKI KTKGQKI PKADLTYI VALAI LAGFVI TINTGINLRI PLLWEFGL 237 

+ A K + IK +G K ++++LA G + + LV G+ 

Sbjct: 176 MVAIK GEPIKVEtSVKFKIPRGVFLLSLANFLGLGAFFAM-LTFLVSYLVSR-GV 227 

Query: 238 GTPAQASLVLSAMMLMGIIAGMSFGQLIAMFHKQLIPICLVLFS-LTLLGVGLPSNLMVL 295 

G +ASL+ S + L+GI+ + G L K + + L S LT L + +PS L ++ 

Sbjct: 228 GME-KASLMFSMLSLVGILGSIIAGFLYDHLGKVSVLLAYALNSLLTFLVIVIPSPLFLI 286 

Query: 297 TISAMASGFLYSL--MVTAVFSLVADRVEYSLVGSATTLVLVF-CNIGGASAAILLSCFD 353 

+ + LYS+ ++TA S A R +V 4-V F IG L+ 

Sbjct: 287 PLGLV LYSVGGIMTAYTSEKASRENLGWMGPTNMVGFFGATIGPYIVGFLIDRLG 342 

Query: 354 HLLGQINAVFYVYAILSLAVGM 375 

+ L + +V Y + ++ +G+ 
Sbjct: 343 YSLALL-SVPLAYLVSAVIIGL 363 

There is also homology to SEQ ID 2378. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 773 

A DNA sequence (GBSx0821) was identified in S.agalactiae <SEQ ID 2379> which encodes the amino 
acid sequence <SEQ ID 2380>. Analysis of this protein sequence reveals the following: 

Possible site: 23 

>>> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.38 Transmembrane 171 - 187 ( 171 - 187) 



- Final Results 

bacterial membrane --- Certainty=0. 1553 (Affirmative) ■ 

bacterial outside Certainty=0 . 0000 (Not Clear) < i 

bacterial cytoplasm --- Certainty=0 . 0000 (Not Clear) < I 



The protein has homology with the following sequences in the GENPEPT database. 
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>GP:CAB61731 GB:AL133220 putative oxidoreductase . [Streptomyces 
coelicolor A3 (2)] 

Identities = 101/327 (30%) , Positives = 169/327 (50%) , Gaps = 12/327 (3%) 

Query: 8 WATLGTGVIANEL-AQALE1RRGQKLYSVANRTYDKGLEFATKYGIQKVYDHIDQVFEDPE 66 

W L TG +A A ++ ++ +VA+RT FA ++GI + Y + + D + 

Sbjct: 11 WGILATGG^WffiFTADLVDLPDAEWAVASRTEASAKTFAERFGIPRAYGGVIETIlARDED 70 

Query: 67 VDIIYISTPHNTHISFLRICALANGI<HVLCEKSITLNETELKEAIDLAETNHVV1AEAMTI 126 

VD++Y++TPH+ H + L G++VLCKK TLN+ E E + LA N V L EAM + 

Sbjct: 71 VDWWATPHSAHRTAAGLCLEAGRNVLCEKPFTLNARFAAELVAIARENGVFLMEAMW 130 

Query: 127 FHMPIYRQLKTLVDSGKLGPLKMIQMNFGSYK3YDKTNRFFSRDLAGGALLDIGVYALSC 186 

+ p+ R+LK LV G +G ++ +Q +FG + +R GGALLD+GVY +S 

Sbjct: 131 YCNPLVRRLKELVADGAIGEVRSLQADFGLAGPFPAAHRLRDPAQGGGALLDLGVYPVSF 190 

Query: 187 IRWFMSEAPHNITSQVTFAPTGVDEQVGILLTNPANEMATVSLSLHAKQPKRATIAYDKG 246 

+ + E P ++ ++ + GVD Q G LL+ + +A++ S+ P A+I +G 
Sbjct: 191 AQLLLGE-PTDVAARAVLSEEGVDLQTGALLSYGNDALASIHCSITGGTPNSASITGSEG 249 

Query: 247 YIEL FEYPRGQKAVITYTEDGHQDIL- -EAGKTENALQYEVADMEEAV-SGKTNH-- 298 

1++ F +P V+ T Q+ A +L++E ++ A+ +G+T 

Sbjct: 250 RIDVPNGFFFP--DHFVLHRTGRDPQEFRADPADGPRESLRHEAEEVMRALRAGETESPL 3 07 

Query: 299 MYIjNYTKDVMDIMTQLRQEWGFTYPEE 325 

+ L+ T VM + +R G YP E 
Sbjct: 308 VPLDGTLAVMRTLDAI RDRVGVRYPGE 334 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 774 

A DNA sequence (GBSx0822) was identified in S.agalactiae <SEQ ID 2381> which encodes the amino 
acid sequence <SEQ ID 2382>. This protein is predicted to be oligopeptidase. Analysis of this protein 
sequence reveals the following: 

Possible site: 19 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 2881 (Affirmative) < suco 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAC14579 GB:AJ249396 oligopeptidase [Streptococcus thermophilus] 
Identities = 504/631 (79%) , Positives = 563/631 (88%) 

Query: 1 MIKYQDDFYQAWGEWAKTAVIPDDKPRTGGFSDLADDIEALMLSTTDKWLADENKPSDT 60 

M + QDDFY A+NGEW KTAVIPDDK? TGGFSDLAD+IE LML TTD+WLA EN P + 
Sbjct: 1 MTRLQDDFYHAINGEWEKTAVIPDDKPCTGGFSDLADEIEDLMLETTDQWLAGENVPDNA 60 

Query: 61 ILNHFIAFHKMTADYQKREEVGVSPVLPLIEEYKGLQSFSEFASKVAEYELEGKPNEFPF 120 

IL +FI FH+MTADY +RE VG+ PV PLIEEYK L SFSEFASK+AEYE+ GKPNEFPF 
Sbjct: 61 ILQNFIKFHRMTADYDRREAVGIEPVKPLIEEYKKLSSFSEFASKIAEYEMSGKPNEFPF 120 

Query: 121 GvAPDFMNAQLNVLWAEAPGIILPDTTYYSEDNEKGKELLAFWRKSQEDLLPLFGIiSEQE 180 

V+PDFMNAQLNVLWA+APGI ILPDTTYY+EDNEKGKELL WR+ QE+LL +G + +E 
Sbjct: 121 SVSPDFMNAQLNVLWADAPGIILPDTTYYTEDNEKGKELLEIWREMQEELLGKYGFTAEE 180 

Query: 181 IKDILDKvXiALDAKLAQYVLSREESSEYVKLYHPYNWEDFTKLAPELPLDAIFQKILGQK 240 

IKD+LDKV+ LDAKLA+YVLS EESSEYV+LYHPY+W DFTKLAPELPLD+IF +ILGQ 
Sbjct: 181 IKDLLDK^IDLDAKLAKYVLSHEESSEYVELYHPYDWADFTKLAPELPLDSIFTEILGQV 240 
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Query: 241 PDKVIVPEERFWTEFASDYYSESKWELLKADL.ILSAMnAWAYLTDDIRIKSGVYSRALS 300 

PDKVIV EE FWTEFA++YYSE+NWELLKA L++ A + +NAYLTD++R+ SG YSRALS 
Sbjct: 241 PDKVIVSEESFWTEFAAEYYSFANWELLKAVLLIDATTSWNAYLTDELRVLSGKYSRALS 300 

Query: 301 GTPQAMDKKICAAYYIASGPYNQALGLVnfAGEKFSPEAI\ADWHKIATMIDVYKSRLEKAD 360 

GTPQAMDKKKAA+YLA GPYNQALGLWYAGEKFSPEAKADVE K+ATMIDVYKSRL+ AD 
Sbjct: 301 GTPQAMDKKKAAFYIAQGPYMQALGLWYAGEKFSPEAKADVEAKVATMIDVYKSRLQTAD 360 

Query: 361 VO^QSTREKAIMKLNVITPHIGYPEKLPETYTKKIIDPKLSLVENATNLDKISIAYGWSK 420 

WLA TREKAI KLNVITPHIGYPEKLPETY KKIID LSLVENA L + ISIA+ WSK 
Sbjct: 361 WLAPETREKAITKLNVITPHIGYPEKLPETYDKKIIDENLSLVENAQKLVEISIAHSWSK 420 

Query: 421 VWKPVDRSEWHMPAHMVNAYYDPQQNQIVFPAAILQEPFYALEQSSSANYGGIGAVIAHE 480 

WNKPVDRSEWHMPAHMVNAYYDPQQNQIVFPAAILQ PFY + QSSSANYGGIGAVIAHE 
Sbjct: 421 WNKPVDRSEWHMPAHMVNAYYDPQQNQIVFPAAILQAPFYDIAQSSSANYGGIGAVIAHE 480 

Query: 481 ISHAFDTNGASFDEHGSLNNW1OTDEDFEMK1CLTDKVVEQFDGLESYGAKVNGKLTVSEN 540 

ISHAFDTNGASFDE+GSL NWWT++D+ AFK+ TDK+V+QF+GL+SYGAKVNGKLTVSEN 
Sbjct: 481 ISHAFDTNGASFDENGSLKNWWTEDDYAAFKERTDKIVDQFEGLDSYGAKVNGKLTVSEN 540 

Query: 541 VADLGGVACALFAAQRESDFSARDFFINFATIWRMKARDEYMQMLASVDVHAPAQWRTNI 600 

VADLGGVACALEAA+R+ DFS R+FFINFATIWR KAR+EYMQMLASVDVHAPA+WRTN+ 
Sbjct: 541 VADLGGVACALEAAKRDEDFSVREFF INFAT I WRTKAREEYMQMLASVDVHAPAKWRTNV 600 

Query: 601 1 

Sbjct: 601 IVTNFDEFHI 



30 Endopeptidases are often exposed antigens. 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2383> which encodes the amino acid 
sequence <SEQ ID 2384>. Analysis of this protein sequence reveals the following: 
Possible site: 51 

>>> Seems to have no N-terminal signal sequence 

35 

Final Resulto 

bacterial cytoplasm --- Certainty=0 .2622 (Affirmative) < suco 
bacterial membrane — Certainty=0 .0000 (Not Clear) < suco 
bacterial outside Certainty=0 . 0000 (Not Clear) c suco 

40 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 504/631 (79%) , Positives = 564/631 (88%) 
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GTPQAMDK+KAA+YLA GP++QALGLWYAG4-KFSPEAKADVE K+A MI+VYKSRLE AD 
Sbjct: 301 GTPQAMDKQKAAFYIAQGPFSQALGLWYAGQKFSPEAKADVESKVARMIEVYKSRLETAD 360 

Query: 361 WLAQSTREKAIMKmVITPHIGYPEKLPETYTKKIIDPKLSLVENATNLDKISIAYGWSK 420 

WLA +TREKAI KLNVITPHIGYPEKLPETY KK+ID LSLVENA DJL KI + IA+ WSK 
Sbjct: 361 WXAPATREKAITKLNVITPHIGYPEICLPETYAKiCVIDESLSLVENAQNLAKITIAHTWSK 420 

Query: 421 WNKPVDRSEWHMPAH^WNAYYDPQQNQIVFPAAILQEPFYALEQSSSANYGGIGAVIAHE 480 

WNKPVDRSEWHMPAH+WAYYD QQNQIVFPARILQEPFY+L+QSSSANYGGIGAVIAHE 
Sbjct: 421 WNKPVDRSEWHMPAHLVNAYYDLQQNQIVF?Aa.ILQEPFYSLDQSSSANYGGIGAVIAHE 480 



Query: 541 VADLGGVACALEAAQRESDFSARDFFINFATIWRMKARDEYMQMLASVDVHAPAQlffiTNI 600 

VADLGGVACALEAAQ E DFSARDFFINFATIWFMKAR+EYMQMIiAS+DVHAP + RTN+ 
Sbjct: 541 VADLGGmCALFAAQSEEDFSARDFFIHFATIWRMKAREEYMQMIASIDVHAPGELRTNV 600 

Query: 601 TVTNFEEFHKEFDVKDGDNMWRPVEKRVI IW 631 

T+TNF4- FH+ FD+K+GD MWR + RVIIW 
Sbjct: .601 TLTNFDAFHETFDIKEGDAMWRAPKDRVIIW 631 

SEQ ID 2382 (GBS193) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 23 (lane 3; MW 73kDa). 

The GBS193-His fusion product was purified (Figure 196, lane 5) and used to immunise mice. The 
resulting antiserum was used for Western blot (Figure 253). These tests confirm that the protein is 
immunoaccessible on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 



Example 775 

A DNA sequence (GBSx0823) was identified in S.agalactiae <SEQ ID 2385> which encodes the amino 
acid sequence <SEQ ID 2386>. This protein is predicted to be immunity protein (mccF-1). Analysis of this 
protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1627 (Affirmative) < suco 

40 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 9433> which encodes amino acid sequence <SEQ ID 9434> 
was also identified. 

45 The protein has homology with the following sequences in the GENPEPT database. 



Query: 1 MSFSKHYLENDILYSASITSRVEDLHEAFADPSVDAILATIGGFNSNELLPYLDYDLISK 60 

++ ++H E + S+SI SRV DLH AF DP V AIL T+GGFNSN+LL YLDY+ I + 
Sbjct: 43 VTIAFJiANECNEFDSSSIESRViroLHAAFFDPGVKAILTTLGGFNSNQLLRYLDYEKIKR 102 

Query: 61 NPKI ICGYSDSTAFLNAI FAKAKIQTYMGPAYSSFKMKEGQPYQTQAWLT - AMTENHYEL 119 

+PKI+CGYSD TA NAI+ K + TY GP +S+F MK+G Y + +L+ +++ +E+ 
Sbjct: 103 HPKILCGYSDITALCNAIYQRIGLVTYSGPHFSTFAMKKGLDYTEEYFLSCCASDDPFEI 162 
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Query: 120 WPSEEWSSDPWYDPSKPRQFFPTEt'IK- IYHHGKASGTI IGGNLSTFGLLRGTPYAPKIER 178 

PS EWS D W+ + R+F+P 4 G A GT+IGGNL T LL+GT Y P+ E 

Sbjct: 163 HPSSEWSDDRWFLDQENRRFYPNWGPWIQEGYAEGTLIGGNDCTLNLLQGTEYFPETEH 222 

Query: 179 YVLLIEEAEESNFYEFDRNLM.I--LQAYPHPQAILMGRFPKECGMTPQVFEYILSKHAI 236 

+LLIE+ S+ 4 FDR+L ++ Ii A+ H +AIL+GRF K 44 4 4 44 
Sbjct: 223 TILLIEDDYMSDIHMFDRDLQSLIHLPAFSHVKAILIGRFQKASNVSIDLVKAMIETKKE 282 

Query: 237 FKEIPVIYDMDFAHTQPLLTVTIGAELSVD 266 

IP+I ++4 HT P+ T IG 44 
Sbjct: 283 LSGIPIIANINAGHTSPIATFPIGGTCRIE 312 

A related DNA sequence was identified in S.pyogenes <SEQ ID 2387> which encodes the amino acid 
sequence <SEQ ID 2388>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

Final Results 

bacterial cytoplasm --- Certainty=0 . 1162 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

Identities = 75/252 (29%) , Positives = 125/252 (48%) , Gaps = 22/252 (8%) 

VDAILATIGGFNSNELLPYLDYDLISKNPKIICGYSDSTAFLNAIFAKAKIQTYMGPAYS 93 
VD 1+ +IGG+NSN +L Y+DYDL + I GYSD+TA A++ K TY+ + 
VDVIMTS IGGYNSNS VLKYIDYDLFKQKPPIFIGYSDTTALALALYKKTGCITYLSQSVI 6 0 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 776 

A DNA sequence (GBSx0824) was identified in S.agcdactiae <SEQ ID 2389> which encodes the amino 
acid sequence <SEQ ID 2390>. Analysis of this protein sequence reveals the following: 

possible site: 15 

?» Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm --- Certainty=0. 3112 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
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No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or 



Example 777 

A DNA sequence (GBSx0825) was identified in S.agalactiae <SEQ ID 2391> which encodes the amino 
acid sequence <SEQ ID 2392>. Analysis of this protein sequence reveals the following: 



d N-terminal signal sequence 



- Final Results 

bacterial cytoplasm - 
bacterial membrane - 
bacterial outside - 



■ Certainty=0. 6171 (Affirmative) ■ 

- Certainty=0. 0000 (Not Clear) < i 

- Certainty=0. 0000 (Not Clear) < i 



A related GBS nucleic acid sequence <SEQ ID 101 75> which encodes amino acid sequence <SEQ ID 
10176> was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or 



Example 778 

A DNA sequence (GBSx0826) was identified in S.agalactiae <SEQ ID 2393> which encodes the amino 
acid sequence <SEQ ID 2394>. Analysis of this protein sequence reveals the following; 



i uncleavable N-term signal seq 
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INTEGRAL 
INTEGRAL 



Likelihood = 
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Likelihood = 
Likelihood = 
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Likelihood = 
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Transmembrane 
Transmembrane 
Transmembrane 315 - 331 ( 307 
Transmembrane 186 - 202 ( 180 
Transmembrane 
Transmembrane 



75 



27 



Transmembrane 



- Final Results 

bacterial membrane - 
bacterial outside - 
bacterial cytoplasm - 



406 ( 382 - 

43 ( 27 - 

123 ( 105 - 

289 ( 273 - 



- Certainty=0. 5076 (Affirmative) ■ 

- Certainty=0. 0000 (Not Clear) < i 

- Certainty=0 . 0000 (Not Clear) < I 



The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAB15347 GB:Z99121 similar to hypothetical proteins [Bacillus subtilis] 
Identities = 174/524 (33%) , Positives = 275/524 (52%) , Gaps = 13/524 (2%) 

Query: 1 MEETILIVSFLLFLILSNVINRIFPKLPLPFIQLVFGILSGLVFHKSQVHIDPELFLAFV 60 

M+ ++++ L + +SN++NR P +P+P IQ+ GIL+ ++ ELF 

Sbjct: 1 MDIFLVVLVLLTIIAISNIVNRFIPPIPVPLIQVAIfilLRASFPCGLHFELOTELFFVLF 60 

Query: 61 IAPMFREGQESDIGSFIKYRAIILYLILPTVFLTAIVVGYVAGHLLPVSLPLAACFALG 120 

IAPL F +G+ + RA IL L L VF T IV GY ++P ++PIAA F L 

Sbjct: 61 IAPLLFNDGKRTPRAELWNLRAPILIiALGLVFATVIVGGOTIHWMI P -AIPLAAAFGLA 119 
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AS AVI YN ++++L +Q 



No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 779 

A DNA sequence (GBSx0827) was identified in S.agalactiae <SEQ ID 2395> which encodes the amino 
acid sequence <SEQ ID 2396>. Analysis of this protein sequence reveals the following: 

Possible site: 23 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 3494 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful 
vaccines or diagnostics. 

Example 780 

A DNA sequence (GBSx0828) was identified in S.agalactiae <SEQ ID 2397> which encodes the amino 
acid sequence <SEQ ID 2398>. This protein is predicted to be integrase (phage-relatedpr). Analysis of this 
protein sequence reveals the following: 

Possible site: 61 

>» Seems to have no N-terminal signal sequence 
Final Results 
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bacterial cytoplasm Csrtainty=0 . 5094 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 101 73> which encodes amino acid sequence <SEQ ID 
10174> was also identified. 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF12706 GB.-AF066865 integrase [bacteriophage TPW22] 
Identities = 171/353 (48%), Positives = 253/353 (71%), Gaps = 1/353 (0%) 



Query: 


21 


Sbjct: 


1 




81 


Sbjct: 


60 


Query: 


141 


Sbjct: 


120 


Query: 


201 


Sbjct: 


180 


Query: 


261 


Sbjct: 


240 


Query: 


321 


Sbjct: 


300 



Y QE+L+KF++QIK AMK+AV E+V++ NEA+ K KS++ 



5 ++Y SYF Yh AVTG4RF+E +GLTOS +DF + I +++DYS T +FA+ K 



NESSKRK+PI S TI +L++YKK +W N +RV + +SN+ NK IK I GRKV HSL 



35 There is also homology to SEQ ID 578. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 781 

A DNA sequence (GBSx0829) was identified in S.agalactiae <SEQ ID 2399> which encodes the amino 
40 acid sequence <SEQ ID 2400>. Analysis of this protein sequence reveals the following: 

Possible site: 56 

»> Seems to have no N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0. 3377 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 782 

A DNA sequence (GBSx0830) was identified in S.agalactiae <SEQ ID 2401> which encodes the amino 
acid sequence <SEQ ID 2402>. This protein is predicted to be homology to cl-like repressor. Analysis of 
this protein sequence reveals the following: 

N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 0827 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD44097 GB.-AF115103 orfl22 gp [Streptococcus thermophilus 
bacteriophage Sfi21] 
Identities = 57/125 (45%) , Positives = 77/125 (61%) , Gaps = 5/125 (4%) 

Query: 3 MKlDQLCKEFGvELCLFDASDWHSSGFYNPITIWIiSVDVNLSEQEQKQVALHELQHKNHF 62 

M +L ++FGV LC F +S W GF +P+ +V+ ++ +L + + +V LHEL H H 
Sbjct: 1 MNESELLEQFGVSLCEFSSSQWTRDGFLDPVNRWYINRDLPTERRLKVLLHELGHLEHD 60 

Query: 63 PYQYQLFRERCELDANPJmiHHLLKEELEIAEDHTQrWLVFMEIQrKLKTIADEAMIKEE 122 

P QY+ RE+ E ANRNMIH LLK E+ FNY+ FMEKY L TI DE +K E 

Sbjct: 61 PKQYERLREKYEACANRNMIHELLKN HslLDNFNYVHFMEKYNLTTICDETFVKNE 115 

Query: 123 YLNLV 127 
YL L+ 

Sbjct: 116 YLKLI 120 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 783 

A DNA sequence (GBSx0831) was identified in S.agalactiae <SEQ ID 2403> which encodes the amino 
acid sequence <SEQ ID 2404>. This protein is predicted to be EpsR protein. Analysis of this protein 
sequence reveals the following: 

3 N-terminal signal sequence 

40 Final Results 

bacterial cytoplasm Certainty=0 .4692 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

45 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF12710 GB:AF066865 repressor protein [bacteriophage TPW22] 
Identities = 36/101 (35%) , Positives = 62/101 (60%) , Gaps = 7/101 (6%) 

Query: 4 LIDRIRELSNKKGMSIOTLEDTLGYSRNSLYSIjNE-NSKMGKPKEIAQYFNVSLDYIjLGL 62 
50 L ++I+EL+++K +S+ +E+ LG++ ++ + N + K K++A+YFNVS+D+LLGL 

Sbjct: 3 LYEKIKEIASQKOTSIRQVEEKLGFANGTIRQWGKKNPGINKVKDVAKYFNVSVDFLLGL 62 

Query: 63 TDNPRIAS - - DETAI IDGQWDLREAAAHTMLFDGKPLDED 101 
DN R D +D V+ E + FDGKPL ++ 
55 Sbjct: 63 DDNQRKKEPVDLADFVDDNKVNWDEWVS FDGKPLSDE 99 



WO 02/34771 PCT/GB01/04789 
-884- 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 784 

A DNA sequence (GBSx0832) was identified in S.agalactiae <SEQ ID 2405> which encodes the amino 
5 acid sequence <SEQ ID 2406>. Analysis of this protein sequence reveals the following: 

Possible site: 43 

>» Seems to have no N-terminal signal sequence 

Final Results 

10 bacterial cytoplasm Certainty=0 .4079 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

15 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 785 

A DNA sequence (GBSx0833) was identified in S.agalactiae <SEQ ID 2407> which encodes the amino 
20 acid sequence <SEQ ID 2408>. Analysis of this protein sequence reveals the following: 

Possible site: 52 

»> Seems to have no N-terminal signal sequence 

Final Results 

25 bacterial cytoplasm --- Certainty=0. 2942 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0030 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 101 71> which encodes amino acid sequence <SEQ ID 
30 10172> was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

35 Example 786 

A DNA sequence (GBSx0834) was identified in S.agalactiae <SEQ ID 2409> which encodes the amino 

acid sequence <SEQ ID 2410>. This protein is predicted to be a replication initiation protein Rep (RC). 

Analysis of this protein sequence reveals the following: 

Possible site: 54 
40 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 3335 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

45 bacterial outside Certainty=0 .0000 (Not Clear) < suco 
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The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

5 Example 787 

A DNA sequence (GBSx0835) was identified in S.agalactiae <SEQ ID 241 1> which encodes the amino 
acid sequence <SEQ ID 2412>. This protein is predicted to be antirepressor. Analysis of this protein 
sequence reveals the following: 

Possible site: 40 
10 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3380 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) <; suco 

15 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:BAA97816 GB:AB044554 antirepressor [Staphylococcus aureus 
prophage phiPV83] 

20 Identities = 70/153 (45%) , Positives = 93/153 (60%) , Gaps = 15/153 (9%) 

Query: 3 EIFVFHGQEWTVTINNEPWFVGKD\MILGYSKSRNAIALHVDEDDALKQGITDNLGRM 62 

+ F F VRTV I NEP+FVGKD+A+ILGY+++ NAI HVD +D L + + G+ 
Sbjct: 5 QTFNFKELPVRTVEIENEPYWGKDIAEILGYARTDNAIRNHVDSEDKLTHQFSAS-GQN S3 

25 

Query: 63 QETII INESGLYSLIL SSKLPQVKE FKRWVTSEVLPQIRQQGAYVPENLSDE 114 

+ IIINESGLYSLI SK +++E FKRWvTS+VLP IR+ G Y +N+ ++ 

Sbjct: 64 RNMI1INESGLYSLIFDASKQSKNEKIRETARKFKRWVTSDVLPAIRKHGIYATDNVIEQ 123 

30 Query: 115 A FIALFTGQKKLKEHQIALAQDVDYLE 141 

I+T KKKE LLQV+ K 
Sbjct: 124 TLKDPDYI ITVLTEYKKEKEQNLVLQQQVEVNK 156 

A related DNA sequence was identified in S.pyogenes <SEQ ID 241 3> which encodes the amino acid 
35 sequence <SEQ ID 2414>. Analysis of this protein sequence reveals the following: 

Possible site: 17 

»> Seems to have no N-terminal signal sequence 



Final Results 

40 bacterial cytoplasm --- Certainty=0 .4609 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

An alignment of the GAS and GBS proteins is shown below. 

45 Identities = 54/142 (38%) , Positives = 73/142 (51%) , Gaps = 7/142 (4%) 



Query: 11 EWTWINNEPWFVGKDVADILGYSKSRNAIALHVDEDDALKCGITDNLGRMQETIIINE 70 

EVRT TINN+ +F D IL S R I +++D I D+LGR Q+ INE 

Sbjct: 13 EWTATINNQIYFNLNDCCQILELSNPRKTIE-RLNKDGVTTSDIIDSLGRTQQANFINE 71 

Query: 71 SGLYSLILSSKLPQVKEFKRWVTSEVLPQIRQQGAYVPENLSDEA FIALFTGQK 124 

S Y L+ S+ P+ ++F WVTSEVLP IR+ GAY+ E ++A I L K 

Sbjct: 72 SNFYKLVFQSRKPEAEKFADVA^ , SE\a,PSIRla^GAY^WEQTLEQALTSPDFLIRIlANELK 131 



55 Query: 125 KLKEHQLALAQDVDYLKNEQPI 146 

+ KE L + L E + 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 788 

A DNA sequence (GBSx0836) was identified in S.agalactiae <SEQ ID 2415> which encodes the amino 
acid sequence <SEQ ID 2416>. This protein is predicted to be ell. Analysis of this protein sequence 
reveals the following: 

no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 3281 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

The protein has homology with die following sequences in the GENPEPT database. 

>GP:AAC27227 GB:AF009630 ell [bacteriophage ML170] 
Identities = 66/161 (40%), Positives = 93/161 (56%), Gaps = 13/161 (8%) 

Query: 15 YQVSNLGRTOSIGRTVNAKQRTRKTKGRILKQSL-SSGYAIVTLSVNGLRKSIRVHRLVA 73 

Y+VSNLG+VR+I GRILK + +GY + L N +K++ +HR++A 

Sbjct: 16 YEVSNLGKVRNI KSGRILKPWIVPNGYLMHQLCENNKKKNLPLHRIIA 63 



Query: 74 

AFI NP K +NHIDENKLNN ++NLEW T KEN HGR++ KVQ L 
Sbjct: 64 TAFIDNPEEKPQVNHIDENKLNNDI^LEWCWKENNIHGTRMKRIAEKIIFKKVIQLDIJSI 123 

Query: 134 GEFINTFDSIKSASMKTGISSQRITATAMGHQKQTHGYKWR 174 

+N F+S+ A +TG+S + 1++ G +K +KWR 
Sbjct: 124 DNVLNEFESMVQAEQETGVSRRNISSCCNGKRKSAGRFKWR 164 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 789 

A DNA sequence (GBSx0837) was identified in S.agalactiae <SEQ ID 2417> which encodes the amino 
acid sequence <SEQ ID 241 8>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal 



Final Results 

bacterial cytoplasm Certainty=0. 2357 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10169> which encodes amino acid sequence <SEQ ID 
1017O was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 
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Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 790 

A DNA sequence (GBSx0838) was identified in S.agalactiae <SEQ ID 2419> which encodes the amino 
5 acid sequence <SEQ ID 2420>. Analysis of this protein sequence reveals the following: 

Possible site: 57 

>» Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -5.47 Transmembrane 21 - 37 ( 19 - 38) 

10 Final Results 

bacterial membrane Certainty=0 .3187 (Affirmative) < suco 

bacterial outside Certainty=0 .0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 .0000 (Not Clear) < suco 

1 5 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 791 

20 A DNA sequence (GBSx0839) was identified in S.agalactiae <SEQ ID 2421> which encodes the amino 
acid sequence <SEQ ID 2422>. This protein is predicted to be DNA polymerase III delta prime subunit 
(dnaB). Analysis of this protein sequence reveals the following: 

3 N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 0544 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

AAF98347 AF280763 DNA polymerase III delta prime subunit [Streptococcus pyogenes] 
Identities = 284/444 (63%), Positives = 357/444 (79%), Gaps = 4/444 (0%) 

Query: 3 ELKVLPHDIQAEQSVLGSIFIKPEKMIEVAEYLKPNDFYRPAHKILFKAMVSLADRGEAI 62 

EL+V P D+ AEQSVLGSIFI P+K+I V E++ P+DFY+ AHKI+F+AM++L+DR +AI 
Sbjct: 8 ELRVQPQDLLAEQSVLGSIFISPDKLIAVREFISPDDFYKYAHKIIFRAMITLSDRNDAI 67 

Query: 63 DIVTIKSTLESTDELGMVGGISyiAEIVNAVPTSSHAEHYAKIVAKKAQLRSIIDNLSDS 122 

D TI++ L+ D+L +GG+SYI E+VN+VPTS++AE+YAKIVA+KA LR II L++S 
Sbjct: 68 DATTIRTILDDQDDLQSIGGLSYIVELVNSVPTSANAEYYAKIVAEKAMLRDIIARLTES 127 

Query: 123 IGNAYDEDMDIDEIIAKAERSLI3VSQASNKSSFRPIHDVLLENHSKIEERSNNTSQITG 182 

+ AYDE + +E+IA ER+LIE+++ SN+S FR I DVL N+ +E RS TS +TG 
Sbjct: 128 VNLAYDEILKPEEVIAGVERALIEIiNEHSNRSGFRKISDVLKVlTYEALEARSKQ^ 187 

Query: 183 IETGFYDFDKLITGLHEDQLI vLAARPAMGKTALALNIAQNVATKSNKAVAVFSLEMGAE 242 

+ TGF D DK+ TGLH DQL++IAARPA+GKTA LNIAQNV TK K VA+FSLEMGAE 
Sbjct: 188 LPTGFRDLDKITTGLHPDQLVIIiAARPAVGKTAFVLHIAQWGTKQKKTVAIFSLEMGAE 247 

Query: 243 SLVERMLSAEGTIINHHIRTGNLTVNETORLIYAQGQLAEAPIFIDDTAGVKITDIRARA 302 

SLV+RML+AEG + +H +RTG LT +W + AQG LAEAPI+IDDT G+KIT+IRAR+ 
Sbjct: 248 SLVDRMLAREGWVDSHSLRTGQLTDQDWISnSIVTIACjGAIAEAPIYIDDTPGIKITEIRARS 307 
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Query: 303 RRLSQETO-GLGLIVIDYLQLIQGSRSDNRQ3EVSEISRQLKIIAKELKVPVIALSQLSR 361 

R+LSQE D GLGLIVIDYLQLI G++ +NRQQEVS+ISRQLKI+AKELKVPVIALSQLSR 
Sbjct: 308 RKLSQEVDGGLGLIVIDYLQLITGTKPENRQQEVSDISRQLKIIAKELKVPVIRLSQLSR 367 

Query: 362 GVEQRNDKRPIMSDLRESGSIEQDADIVAFLYRDAYYQ DKKEGQPENDITELIIRKN 418 

GVEQR DKRP++SD+RESGSIEQDADIVAFLYRD YY+ D E E++ E+I + KM 
Sbjct: 368 GVEQRQDKRPVLSDIRESGSIEQDADIVAFLYRDDYYRKECDDAEEAVEDKTIEV1LEKN 427 

Query: 419 RHGNLGTVKLYFHKEYTKFSSVEE 442 

R G GTVKL F KEY KFSS+ + 
Sbjct: 428 RAGARGTVKLMFQKEYNKFSSIAQ 451 

There is also homology to SEQ ID 2424: 

Identities = 284/444 (63%) , Positives = 357/444 (79%) , Gaps = 4/444 (0%) 



D TI++ L4 D+L +GG+SYI E+VN+VPTS++AE+YAKIVA+KA LR II L++S 





3 


Sbjct: 


11 




63 








123 


Sbjct: 


131 




183 


Sbj Ct: 


191 


Query: 


243 


Sbjct: 


251 


Query: 


303 


Sbjct: 


311 


Query: 


362 


Sbjct: 


371 


Query: 


419 


Sbjct: 


431 



+E+IA ER+LIE+++ SN+S FR 1 DVL N+ +E RS TS +TG 



7FSLEMGAE 242 

- TGF D DK+ TGLH DQL++LAARPA+GKTA LNIAQNV TK K 



SLV+RML+AEG + +H +RTG LT +W + AQG LAEAPI+IDDT G+KIT+IRAR+ 



R+LSQE D GLGLIVIDYLQLI G++ +NRQQEVS+ISRQLKI+AKELKVPVIALSQLSR 



GVEQR DKRP++SD+RESGSIEQDADIVAFLYRD YY+ D E E++ E+I+ KN 



GTVKL F KEY KFSS+ + 
^RGTVKLMFQKEYNKFSSIAQ 454 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 792 

A DNA sequence (GBSx0840) was identified in S.agalactiae <SEQ ID 2425> which encodes the amino 
acid sequence <SEQ ID 2426>. Analysis of this protein sequence reveals the following: 

3 N-terminal signal sequence 

■ Final Results 

bacterial cytoplasm Certainty=0. 2146 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 
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A related GBS nucleic acid sequence <SEQ ID 10167> which encodes amino acid sequence <SEQ ID 
10168> was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

5 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 793 

A DNA sequence (GBSx0841) was identified in S.agalactiae <SEQ ID 2427> which encodes the amino 
acid sequence <SEQ ID 2428>. Analysis of this protein sequence reveals the following: 

10 Possible site: 15 

>>> Seems to have no N- terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 .2774 (Affirmative) < suco 

15 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) c suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

20 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 794 

A DNA sequence (GBSx0842) was identified in S.agalactiae <SEQ ID 2429> which encodes the amino 
acid sequence <SEQ ID 2430>. Analysis of this protein sequence reveals the following: 

25 Possible site: 28 

»> Seems to have no N-terminal signal sequence 

INTEGRAL Likelihood = -1.91 Transmembrane 63 - 79 ( 62 - 79) 

Final Results 

30 bacterial membrane Certainty=0 . 1765 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 8661> which encodes amino acid sequence <SEQ ID 8662> 
35 was also identified. Analysis of this protein sequence reveals the following: 

Lipop: Possible site: -1 Crend: 10 
McG: Discrim Score: -11.31 
GvH: Signal Score (-7.5): -1.86 
Possible site: 28 
40 >» Seems to have no N-terminal signal sequence 

ALOM program count: 1 value: -1.91 threshold: 0.0 

INTEGRAL Likelihood = -1.91 Transmembrane 61 - 77 ( 60 - 77) 
PERIPHERAL Likelihood =9.92 19 
modified ALOM score: 0.88 

45 

*** Reasoning Step: 3 

Final Results 

bacterial membrane — Certainty=0 . 1765 (Affirmative) < suco 
50 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 
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bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB18686 GB;U38906 ORF11 [Bacteriophage rlt] 
Identities = 101/249 (40%), Positives = 157/249 (62%), Gaps = 21/249 (8%) 

^QHSMFSHKITETDRFLEMeLSSQALYB'HUWGADDEGFIDKAKTIQRTIGRSDDDMKL 62 
MAQRRM ++ +T +FL +PL +QALYFHL + ADD+G ++ A + R +GA++D + L 
MAQRRMIDKRTIQTQKFLRLPLETQMjYFHLMLjMADDDGWE -AFPVVRMVGAAEDSLGIi 5 9 



- IPY EI+ YLN+K R++R N++ 



K LIKARW+EG++L+DFK V+D V +WSG + E YL+P+TLF +KF+ YUNQ 



No corresponding DNA sequence was identified in S.pyogenes. 

SEQ ID 8662 (GBS344) was expressed in E.coli as a His-fusion product. SDS-PAGE analysis of total cell 
extract is shown in Figure 72 (lane 12; MW 30.9kDa). It was also expressed in E.coli as a GST-fusion 
product. SDS-PAGE analysis of total cell extract is shown in Figure 81 (lane 3; MW 59kDa). 

The GBS344-GST fusion product was purified (Figure 213, (lane 3; Figure 226, lanes 4-6) and used to 
immunise mice. The resulting antiserum was used for FACS (Figure 271), which confirmed that the protein 
is immunoaccessible on GBS bacteria. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 







Sbjct- 




Query: 


63 


Sbjct: 






122 


Sbjct: 


116 


Query: 


182 


Sb j ct : 


166 


Query: 


237 


Sbjct: 


226 



Example 795 

A DNA sequence (GBSx0843) was identified in S.agalactiae <SEQ ID 2431> which encodes the amino 
acid sequence <SEQ ID 2432>. Analysis of this protein sequence reveals the following: 

o N-terminal signal sequence 



Final Results 

bacterial cytoplasm Certainty=0 .2549 (Affirmative) < succ 

bacterial membrane --- Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database: 



Query: 12 Vl.EETCETOGCQLWLTKVPIKGRlL^ELKQCPECTKAAINIFENKIiNSQSKINSKrADTYA 71 

VLE+ C HG L +T +G E++ CP+C A+ + + + + +++ S 4-A 
Sbjct: 16 VLEQKCSKHGLNL-ITYKNHEG--EQVTCCPQCQAEALEVLQERFDQKAR-QSIIARK-- 69 
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72 


Sbjct: 


70 


Query: 




Sb j ct : 


122 




182 


Sbjct: 


178 



F +S3j + K+ + + +E IK ++ A4- +A + + A++ 
- FRENSIANS KMWKCTFDTFEAQPGS AEEij I KGQVRJ3AAVAFATKPVAHH - AVL 121 

TGPSGVGKSHLTYGlAKFMIffiQFKAYESPKBVIjFISIjVSLFTKIKESFKOT)NC3Y-RQM)M 181 

G G GKSffla A M ++ + K++ FI++ LF+KIK SF ■*- Y + 
YGQPGAGKSHt AMAMMQEIHKHRPTKTMAFIHISELFSKIKNSFDDPSEYl'JTKEKA 177 

IELLTRVDYIiFLDDLGKESRKGDS- -O^HE^QILYEILDNHSKTIIKTOLSSKEIKALY 240 
+E++ VD L +DDLG EE G + + +W ++Y++L+N+ II TNLS +E+K +Y 
LEIMRGVDLLCIDDLGTESSMGRTGQEATKWAQDVIYDVLENQDRIIITTNLSERELKRVY 238 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be \ 
vaccines or diagnostics. 



Example 796 

A DNA sequence (GBSx0844) was identified in S.agalactiae <SEQ ID 2433> which encodes the amino 
acid sequence <SEQ ID 2434>. This protein is predicted to be methyl transferase. Analysis of this protein 
sequence reveals the following: 

20 Possible site: 47 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm 
25 bacterial membrane 

bacterial outside 



--- Certainty=0. 1241 (Affirmative) < suco 
--- Certainty=0. 0000 (Not Clear) < suco 
Certainty=Q. 0000 (Not Clear) < suco 



A related GBS nucleic acid sequence <SEQ ID 10165> which encodes amino acid sequence <SEQ ID 
10166> was also identified. 
30 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC98421 GB:L29323 methyl transferase [Streptococcus pneumoniae] 
Identities = 262/474 (55%), Positives = 313/474 (65%), Gaps = 71/474 (14%) 

Query: 2 MKFLDLFAG1GGFRLGMEQAGHECIGFCEINKFARASYKVIHDTEGEIELHDITRVSD-E 60 
35 M+F+DLF+GIGGFRLGME GHECIGFCEI+KFAR SYK I TEGEIE HDI VSD E 

Sbjct: 1 MRFIDLFSGIGGFRLGMESVGHECIGFCEIDKFARESYKSIFQTEGEIEFHDIRDVSDDE 60 

Query: 61 FIRGIGSVDVICXMFPCQAFSIAGI^RGFSDTRGTI.FFEIARFASILRPKYLFbENVKGl, 120 
F 4 G VDVICGGFPCQAFSIAG R GFEDTRGTLFFEIAR A ++P++LFIjENVKGIj 
40 Sbjct: 61 FKMjRGKTOVICGGFPCQAFSIAGRRLGFEDTRGTIiPFEIARAAKQIQPRFLFIjENVKGL 120 

Query: 121 IJfflEGGATFETIIRTIjDELGYIIVEWQIFKSKNFGVPQIIRERVFIIGHLRGEGTRPIFPFE 180 

13!M+ G TF TI+ TLDELG+ +VEWQ+ NSK4-FGVPQHRERVFIIGH R GTR FPF 
Sbjct: 121 LJSHDKGRTFTTILTTLrDELGFDVEWOMLNSKDFGVPQNRERVFIIGHSRKRGTRLGFPFR 180 

45 

Query: 181 SS1TENYPIHTRKIGNVNPSGMGMKGEVYDSEGLSPTLTTO3KGEGVKIAVN 231 

P + +GN+NPS +GM+G+VY SEGL+PTL KGEG KIA+ 
Sbjct: 181 REGC^TNPETDKILGNrjKPSKSGMSGKVYYSEGIAPTLVRGKGEGFKIAIPCfWPDRLDK 240 

50 Query: 232 — WGRnPGKFEMPNRVYDPDGLAPTIRTMQGGGLE 265 

VVG LP F+ RVY +GI.+PT+ TMQGG 
Sbjct: 241 RQHGRRFKDNQEPMFTUm^RHGrtfVV^^ 300 

Query: 266 PKIIQRGRGYNOGGEYEISPTTOCNSWQENKL^^ 325 
55 p K i + + LK++ERTKKGY++AE GDS+NL P+S+ 

Sbjct: 301 PKIUP EPIQFLKVREATKKGYAQAEIGDSIKLERPSSQ 333 



Query: 326 TRRGRVGKGIACTTLLTGEEQGVW--YDLYNRRKKDIVGTIjTASGHNGNTTTGTFGISNG 383 
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RRGRVGKGIANTL T + GVW Y+ +++ + G L G 
Sbjct: 340 HRRGRVGKGIMJTLTTSGQMGWl'ASYEGEDKQVYQVaGVLID GQFYR 387 

Query: 384 FRIRKLTPRECWRLQGFPDWAPDKASQTOSNSQLYKQAGNSVTVNVIAAIflERL 437 
5 RIR++TP+EC+RLQGFPDWAF+ A +V+SNSQLYKQAGNSVTV VIAAIA++L 

Sbjct: 388 LRIRRITPKECFRLQGFPDWAFEAAR1OTSSNSQLYKQAGNSVTVPVIAAIAKKL 441 

There is also homology to SEQ ID 2436: 

Identities = 53/75 (70%), Positives = 62/75 (82%), Gaps = 1/75 (1%) 

10 

Query: 2 MKFLDLFAGIGGFRLGMEQAGHECIGFCEINKFARASYKVIHDTEGEIELHDITRVSDEF 61 

MKFLDLFAGIGGFRLG+ HECIGFCE1+KFAR SYK I++TEGEIE HDI +V+D+ 
Sbjct: 4 MKFLDLFAGIGGFRLGLINQCHEC1GFCEIDKFARQSYKA1YETEGEIEFHDIRQVTDQD 63 

15 Query: 62 IRGI-GSVDVICGGF 75 

R + G VD-hlCGGF 
Sbjct: 64 FRQLRGQVDIICGGF 78 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
20 vaccines or diagnostics. 

Example 797 

A DNA sequence (GBSx0845) was identified in S.agalactiae <SEQ ID 243 7> which encodes the amino 
acid sequence <SEQ ID 243 8>. Analysis of this protein sequence reveals the following: 

Possible site: 29 
25 >» Seems to have no N-terminal signal sequence 

-. Final Results 

bacterial cytoplasm --- Certainty=0 .2585 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

30 bacterial outside — Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
35 vaccines or diagnostics. 

Example 798 

A DNA sequence (GBSx0846) was identified in S.agalactiae <SEQ ID 2439> which encodes the amino 
acid sequence <SEQ ID 2440>. This protein is predicted to be arpR protein. Analysis of this protein 
sequence reveals the following: 

40 Possible site: 46 

>» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0 . 5070 (Affirmative) < suco 

45 bacterial membrane Certainty=0 . DD0D (Not Clear) < suco 

bacterial outside Certainty=0.0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAB09197 GB:U24159 orfl2 [Bacteriophage HP1] 
50 Identities = 34/69 (49%), Positives = 47/69 (67%), Gaps = 1/69 (1%) 

Query: 1 MTKTMTLEEKVEQWFIDKNLHE-ANPVKQFQKLIEETGELYSGIAKGKSEIIRDSLGDMQ 59 
M L + +EQW DRNL E + P EQF KIi+EE GEL SG+AK K ++I+DS+GD 
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Sbjct: 1 MADLQQLIKNIEQWAEDRNLVEDSTPQKQFIKLMEEFGELCSGVAKNKPDVIKDSIGDCF 60 

Query: 60 WLIGIEQQ 68 
W++ + +Q 
5 Sbjct: 61 WMVILAKQ 69 

No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

10 Example 799 

A DNA sequence (GBSx0847) was identified in S.agalactiae <SEQ ID 2441> which encodes the amino 
acid sequence <SEQ ID 2442>. Analysis of this protein sequence reveals the following: 

Possible site: 58 

»> Seems to have an uncleavable N-term signal seq 
15 INTEGRAL Likelihood = -5.10 Transmembrane 13 - 29 ( 10 - 36) 

Final Results 

bacterial membrane Certainty=0 . 3039 (Affirmative) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

20 bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAD21919 GB:AF085222 unknown [Streptococcus thermophilus 
bacteriophage DTI] 

25 Identities = 31/67 (46%) , Positives = 49/67 (72%) , Gaps = 1/67 (1%) 

++ + ++++ ADN E+ GK+T K ++ +T+ GAYGKFLV+ 1-EQY+ 1 VGD-) IP 
Sbjct: 34 NRPVEAIVvHKMDNF-VEDHGKVTGKSMVGKLYTIDCGAYGKFLVSKEQYDSVQVGDEIP 92 

30 

Query: 102 DYLKGRG 108 

YLKGRG 
Sbjct: 93 SYLKGRG 99 

35 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 800 

A DNA sequence (GBSx0848) was identified in S.agalactiae <SEQ ID 2443> which encodes the amino 
40 acid sequence <SEQ ID 2444>. This protein is predicted to be gene 17 protein. Analysis of this protein 
sequence reveals the following: 

Possible site: 55 

>» Seems to have no N-terminal signal sequence 

45 Final Results 

bacterial cytoplasm Certainty=0. 5428 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside --- Certainty=0 . 0000 (Not Clear) < suco 

50 The protein has homology with the following sequences in the GENPEPT database. 

>GP:CAA24397 GB:V01146 gene 1.7 [Bacteriophage T7] 
Identities = 30/72 (41%) , Positives = 40/72 (54%) 
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Query: 47 DNVNYPSHYQGIOTGLESIDVLRNFMTPEMIjKGFyLGNALKYQLRYRKKNGLEDLKKARKN 106 

+ V PSHY +E+I+V+ MT E KG+ GN LKY+LR KK+ h L+K 

Sbjct: 120 EGVTKPSHYMLFDDIEAIEVIARSMTVEQPKGYCFGNILKYRLRAGKKSELAYLEKDLAK 179 

5 Query: 107 LDWMEEMEKEK 118 

D+ E EK K 
Sbjct: 180 ADFYKELFEKHK 191 

No corresponding DNA sequence was identified in S.pyogenes. 

10 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 801 

A DNA sequence (GBSx0849) was identified in S.agalactiae <SEQ ID 2445> which encodes the amino 
acid sequence <SEQ ID 2446>. Analysis of this protein sequence reveals the following: 

15 Possible site: 28 

>>> Seems to have no N-terminal signal sequence 

, Final Results 

bacterial cytoplasm Certainty=0. 1375 (Affirmative) < suco 

20 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

25 Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 802 

A DNA sequence (GBSx0850) was identified in S.agalactiae <SEQ ID 2447> which encodes the amino 
acid sequence <SEQ ID 2448>. Analysis of this protein sequence reveals the following: 

30 Possible site: 31 

»> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 0087 (Affirmative) < suco 

35 bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10163> which encodes amino acid sequence <SEQ ID 
101 64> was also identified. 

40 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAF26608 GB:AF145054 ORF9 [Streptococcus thermophilus 
bacteriophage 7201] 
Identities = 99/148 (66%), Positives = 116/148 (77%), Gaps = 10/148 (6%) 

45 Query: 5 MINNVVI,IGRLTRDvELRYTPSNIANATFNIAVNRN^ 64 

MINN VL+GRLT+D E +YT SNIA A+F+LAVNRNFK+A G+READFINCV+WRQQAEN 
Sbjct: 1 MINNTTOVGRLTKDPEFKYTGSNIAVASFSIAVNRNFKDANGEREADFINCVIWRQQAEN 60 

Query: 65 lANWTKKGMLIGITGRIQTRSYENCjQGQRIYVTEVVADSFQILEKR DNSTNQASMD 120 

50 LANW KKG LIGITGRIQTRSYENQQGQR+YVTEVVA++FQ+LE R + N + 

Sbjct: 61 LANWAKKGALIGITGRIQTRSYmQQGQRVYVTEWAENFQMLESRAAREGGNANNSYSQ 120 
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Query: 121 DQLP PSFGNSQPMDISDDDLPF 142 

Q+P + N QP+DIS DDLPF 

Sbjct: 121 QQVPNFARKNTEYSNKQPLDISSDDLPF 148 

5 

There is also homology to SEQ ID 1492. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 803 

10 A DNA sequence (GBSx0851) was identified in S.agalactiae <SEQ ID 2449> which encodes the amino 
acid sequence <SEQ ID 245 0>. This protein is predicted to be puff C4B protein. Analysis of this protein 
sequence reveals the following: 

Possible site: 19 

>>> Seems to have no N- terminal signal sequence 

15 

Final Results 

bacterial cytoplasm Certainty=0 . 1203 (Affirmative) < suco 

bacterial membrane — Certainty=0 . 0000 (Not Clear) < suco 
bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

20 

A related GBS nucleic acid sequence <SEQ ID 10161> which encodes amino acid sequence <SEQ ID 
10162> was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

No corresponding DNA sequence was identified in S.pyogenes. 

25 Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 804 

A DNA sequence (GBSx0852) was identified in S.agalactiae <SEQ ID 245 1> which encodes the amino 
acid sequence <SEQ ID 245 2>. This protein is predicted to be F5M15.19. Analysis of this protein sequence 
30 reveals the following: 

Possible site: IS 

>>> Seems to have an uncleavable N-term signal seq 

INTEGRAL Likelihood = -2.34 Transmembrane 7 - 23 ( 6-23) 

35 Final Results 

bacterial membrane Certainty=0 . 1935 (Affirmative) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

bacterial cytoplasm Certainty=0 . 0000 (Not Clear) < suco 

40 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 
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Example 805 

A DNA sequence (GBSx0853) was identified in S.agalactiae <SEQ ID 2453> which encodes the amino 
acid sequence <SEQ ID 2454>. Analysis of this protein sequence reveals the following: 

Possible site: 54 
5 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 4398 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

10 bacterial outside --- Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10159> which encodes amino acid sequence <SEQ ID 
101 60> was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

1 5 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 806 

A DNA sequence (GBSx0855) was identified in S.agalactiae <SEQ ID 2455> which encodes the amino 
20 acid sequence <SEQ ID 2456>. Analysis of this protein sequence reveals the following: 
Possible site: 58 

>>> Seems to have no N-terminal signal sequence 

Final Results 

25 bacterial cytoplasm Certainty=0 . 2992 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 

30 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 807 

A DNA sequence (GBSx0856) was identified in S.agalactiae <SEQ ID 2457> which encodes the amino 
35 acid sequence <SEQ ID 2458>. Analysis of this protein sequence reveals the following: 

Possible site: 54 

»> Seems to have no N-terminal signal sequence 

Final Results 

40 bacterial cytoplasm Certainty=0 .4639 (Affirmative) < suco 

bacterial membrane Certainty=0 . 0000 (Not Clear) < suco 

bacterial outside Certainty=o . 0000 (Not Clear) < suco 

The protein has homology with the following sequences in the GENPEPT database. 

45 ?GP:BAB07758 GB:AP001520 unknown conserved protein [Bacillus halodurans] 

Identities = 65/184 (35%) , Positives = 102/184 (55%) , Gaps = 6/184 (3%) 

Query: 1 MNIVEPLRDKDDIC^MKDXI^^ 60 
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M V P RD D IQA+K L + + Y+LF +GINTG R+ 4L LK+KDV 
Sbjct: 1 MEYWPFRDVDQIQAIKRSLKKKSPRDYLLFTIGINTGLRISQLLALKIKDVYDGQKPKD 60 



5 



Query: 61 EQKTGKYKSIKMTRPLKNELR EFVKDKELHEYLFQSRVGKNKALSYKTVYWFLKRAA 117 

+ + + + +K L+ F++ +E H LF S ++ ++ + Y +K+AA 
Sbjct: 61 YLQLESGE1VYLNDQVKKALQFYAHF1EFQEQH-CLFAS-TNPDQPMTRQHAYRIIKQAA 118 



10 



Query: 118 EDLGI-DOTGTHTMRKTFGYHYYKKYKNVADLMSLFNHSSPAVTLIYICWQDELDTKMS 176 

+G+ D +GTHT+RKTFGYH Y++ ++ L FNH +PA TL YI + ++E 
Sbjct: 119 LQVGLTDQIGTHTLRKTFGYHAYRQGVALSLLQQRFlfflQTPAQTLRYIDIAKNEQTIPRI 178 



Query: 177 NFSL 180 
N +L 

Sbjct: 179 NVNL 182 



15 



No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 808 

20 A DNA sequence (GBSx0857) was identified in S.agalactiae <SEQ ID 2459> which encodes the amino 
acid sequence <SEQ ID 2460>. Analysis of this protein sequence reveals the following: 

Possible site: 33 

>>> Seems to have no N- terminal signal sequence 
25 Final Results 



30 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 

Example 809 

35 A DNA sequence (GBSx0858) was identified in S.agalactiae <SEQ ID 2461> which encodes the amino 
acid sequence <SEQ ID 2462>. Analysis of this protein sequence reveals the following: 
Possible site: 33 

>>> Seems to have no N-terminal signal sequence 
40 Final Results 



45 The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
vaccines or diagnostics. 



bacterial cytoplasm — Certainty=0 .3582 (Affirmative) 
bacterial membrane — Certainty=0 . 0000 (Not Clear) < 
bacterial outside Certainty=0 . 0000 (Not Clear) < 




bacterial cytoplasm Certainty=0. 2732 (Affirmative) 

bacterial membrane Certainty=0 . 0000 (Not Clear) < 

bacterial outside Cerfcainty=0 . 0000 (Not Clear) < 
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Example 810 

A DNA sequence (GBSx0859) was identified in S.agalactiae <SEQ ID 2463> which encodes the amino 
acid sequence <SEQ ID 2464>. Analysis of this protein sequence reveals the following: 

Possible site: 27 
5 »> Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 1720 (Affirmative) < suco 

bacterial membrane Certainty=0.000C (Not Clear) < suco 

10 bacterial outside Certainty=0 . 0000 (Not Clear) < suco 

The protein has no significant homology with any sequences in the GENPEPT database. 
No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that this protein and its epitopes, could be useful antigens for 
1 5 vaccines or diagnostics. 

Example 811 

A DNA sequence (GBSx0860) was identified in S.agalactiae <SEQ ID 2465> which encodes the amino 
acid sequence <SEQ ID 2466>. Analysis of this protein sequence reveals the following: 

Possible site: 26 
20 >» Seems to have no N-terminal signal sequence 

Final Results 

bacterial cytoplasm Certainty=0. 2619 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

25 bacterial outside Certainty=0. 0000 (Not Clear) < suco 

A related GBS nucleic acid sequence <SEQ ID 10157> which encodes amino acid sequence <SEQ ID 
10158> was also identified. 

The protein has no significant homology with any sequences in the GENPEPT database. 

30 No corresponding DNA sequence was identified in S.pyogenes. 

Based on this analysis, it was predicted that these proteins and their epitopes could be useful antigens for 
vaccines or diagnostics. 

Example 812 

A DNA sequence (GBSx0861) was identified in S.agalactiae <SEQ ID 2467> which encodes the amino 
35 acid sequence <SEQ ID 2468>. This protein is predicted to be terminase large subunit. Analysis of this 
protein sequence reveals the following: 

Possible site: 13 

>» Seems to have no N-terminal signal sequence 

40 Final Results 

bacterial cytoplasm Certainty=0 .2753 (Affirmative) < suco 

bacterial membrane Certainty=0. 0000 (Not Clear) < suco 

bacterial outside Certainty=0. 0000 (Not Clear) < suco 

45 The protein has homology with the following sequences in the GENPEPT database. 

>GP:AAC27181 GB:AF009630 putative terminase subunit [bacteriophage 
bIL170] 



